I am happy to introduce you to the 5th incarnation of Hook’s Humble Homepage – and the most comprehensive incarnation ever.

This is a very long post, that starts with some history of my blog, continues with the reasons why I migrated, the technicalities of migration(s) and ends with an overview of what is new on this website now. If you are interested in in just a specific part, I suggest you scroll until you reach it. In case you are reading this in a browser (as opposed to a feed aggregate client), you might find the new ToC in the sidebar useful for navigation.

From static HTML to CMS and back again

As any migration was a very interesting walk down memory lane, but the sheer mass of this one was even more so.

The following sub-chapters describe those times and the reasons why I switched to the next solution.

1st incarnation – first steps with HTML

The earliest archives I could find of my blog are those on our high school’s student GNU/Linux server in 2003 called Lenin, although I am fairly sure I had a homepage of some sort for at least two years before the Internet Archive started to log it. Especially since in 2003 I was not in high school anymore and I have been using that box since around 1997 or so.

In those years I wrote static HTML by hand using first Pico and later JOE and JED. I remember how very happy I was when I learnt about CSS (version 2) – that made such an amazing difference! Before that we had to define every font, colour, width … everything! inside the HTML tags themselves – what a chore!

The time spent with other Lenin users was very detrimental to my future – I was both introduced to GNU/Linux as well as to the whole Free Software idea and community. As a result I joined LUGOS in the late 90’s and became more active in the movement.

While its name might be controversial (which server’s is not?), Lenin produced some of the best system admins that Slovenia has to offer (not me, obviously).

Since Lenin is long dead and the few blog posts that are logged by the Internet Archive is all that is left of that, I have not bothered migrating it. (I might decide otherwise in the future.)

Temporary blog to bridge the gap

By 2004 I decided that keeping a blog alive in pure hand-written HTML was a bit too much of a chore and so I went with the trend and opened a blog at BlogSpot (later renamed by Google to Blogger). Around that time Lenin was also slowly dying.

In 2005 I started also to post bi-lingual (Slovenian & English) news about FOSS on Kiberpipa’s website, of which I have been a member since 2001.

My BlogSpot blog was never intended to be kept alive for long, so I do not even count it as one of the incarnations.

The first blog post there is also the oldest post migrated to this incarnation.

2nd incarnation – wonders of CMS and troubles of own server

Wanting more power over my own creations in 2005 I registered http://matija.suklje.name and started running Drupal 4 with MySQL on my very own server.

The server itself was an (already then) old Pentium MMX running ClarkConnect (now known as ClearOS) that was churning happily under my desk.

This incarnation was probably the most daunting for me – I had to learn to install and administrate a database, web server and a PHP-based CMS. Luckily ClarkConnect had for that time a very user-friendly Web GUI, so I somehow managed to keep it alive. I quite enjoyed how flexible Drupal was and kept using it for many years.

What I was still to wet behind my ears to manage was to regularly update Drupal and to migrate (or backup) the database after I abandoned that server. What I also did not master back then was how to run a proper mail server – I had to give up on that after months battling with spammers.

After about 3 years I had to abandon that server (lovingly called Dryades) due to its noise in the room and slowly dying hardware.

I migrated all the blog posts into Pelican (more on that below), but to see how it looked like, you can check it out on the Internet Archive.

3rd incarnation – CMS on a community server

Luckily in 2008 I already had access to Kiberpipa’s web-server Dogbert for quite some time. So the decision where to migrate to was fairly simple.

The admins in Kiberpipa/Cyberpipe were also very helpful in setting up the MySQL database for Drupal, even if most people there used WordPress.

A nice thing about this install was that Dogbert had a capable team administrating it and that they took care of all the stuff apart from Drupal itself. A less fun thing was that I had only limited access to the MySQL database, DNS etc. and depended on other people’s free time when I needed something.

But as this was the longest lasting incarnation, obviously the positive side outweighed the negative side. This time I learnt and even managed to regularly update my Drupal install and was toying with the idea to expand my website from a mere blog into something bigger. As you can see things went the opposite way.

Already at this point in time I was trying to fight off spam comments with all the possible means I could find, as some popular posts caught the eye of the dark side of the net.

4th incarnation – on my own again, trying something lighter

Inspired by the FreedomBox project, in 2012 I got myself a DreamPlug and decided to host my own stuff again. The new server was (and still is) called Ganesha and is powered by Gentoo GNU/Linux.

So I started looking for some cool blogging system that can run on such a low-power device.

After days weeks of exploring different alternatives I decided to take Habari – it still needs PHP, but is quite undemanding and works with almost any popular SQL DB (I chose SQLite).

In September that year I migrated to Habari – the migration from Drupal went pretty smooth and I really liked the minimalist, yet very customisable interface.

At this point I have to mention that neither the migration from Drupal or Habari had anything to do with the quality of that software – in fact, I would still warmly recommend both! – but merely due to my use case at that time not being the optimal for it.

The reason why I left Habari was simply because the spam bots by now started hammering my poor little Ganesha so bad that effectively it was equivalent to a DDoS attack. After trying all the tricks that I felt comfortable with, the spamming still crippled my system too much. So I decided to do something drastic – remove comments all together and go for static HTML again.

5th incarnation – and full-circle back to static HTML

As mentioned above, at some point the spam got the better of me and I decided to go back to something where spam bots had no possibility to render my server useless again.

Already in the past migration I was looking at Pelican as one of the possible contenders.

Back then I decided against it because it was not dynamic and depended on 3rd party commenting services. But after a bit more then a year, this sounded like more and more like a feature than a bug for my poor Ganesha. Other reasons why I chose it is that I am way more familiar with Python and MarkDown than with SQL and PHP.

So much for the history – the next section is all about the migration itself.

Migration itself:

Whew! This had to be the most arduous migration I have done so far.

Part of the blame goes to myself, since I had to fix a few issues that accumulated in time as well as deciding to redesign my categories and tags.

It was a lot of work, but it was well worth it, as finally I have all my blog posts – dating all the way back to 2004 – in one place. Not only that, but I have them stored and backed up in static HTML, which means I can just keep that output as an archive pretty much forever. No more pesky migrations! ☺

Habari to WordPress

First I exported the content from Habari using the exportsnapshot plugin.

In order not to muck up my data, I copied the exported snapshot into a working folder – it’s the newest and probably biggest file in user/cache/ (in my case it was called user/cache/42113b6a57f092f8074913435ff86768.008cd18209ee2568388fd663955317b0.cache).

Then I had to open it in a text editor and then deleted the s:1594235:" at the very beginning of the file and "; at the very end.

Now I was finally left with a proper WXR XML2.

WordPress to Pelican

This is where things started to really complicate. I will keep it short and omit the whole trial and error session needed to figure out what works.

To convert the WXR into Pelican with Markdown, I (roughly) followed these instructions on how to migrate WordPress to Pelican, since I found it more stable than the official pelican-import and it kept more data as well. Note that due to using an old version of the base code, the diffs in Kevin’s How-To refer to different lines then the current version. I found it fairly trivial to find the relevant lines though. At the moment of writing, add ~30 to the lines to find the relevant bits.

As there were still some inconsistencies present, I had to fix the output of the script to be importable. I cannot remember all that I did, but I know that I had to capitalise all the dictionary keys in the headers (e.g. author:Author:), fix the tags key (tag:Tags:) and the time-stamp (e.g date: 2010-12-25T20:30+0000Date: 2010-12-25 20:30). Needless to say, my RegExp-fu improved a lot.

There are several scripts out there for migrating from WordPress to Markdown style footnotes, but when I tested them in my case they misfired a lot, so I just changed that by hand.

After that I had a more-or-less usable content and could start improving it.

For compatibility with Habari’s way of naming posts I added the appropriate settings to pelicanconf.py:

ARTICLE_URL = u'{slug}'
ARTICLE_SAVE_AS = u'{slug}.html'
PAGE_URL = u'{slug}'
PAGE_SAVE_AS = u'{slug}.html'

and to /etc/nginx/nginx.conf:

    # Redirect to be able to omit .html in URLs
    try_files $uri $uri/ $uri.html =404;

Blogger/BlogSpot to Pelican

I left very few blog posts on BlogSpot before I migrated to Drupal, so just used Pelican’s own Swiss-army-knife importer and decided to fix all the inconsistencies myself.

The importing bit is very easy. I just needed to run5:

pelican-import --feed -m markdown http://silver_hook.blogspot.com/feeds/posts/default

The exported MarkDown files first needed to be properly renamed from .md4 to .markdown, so Vim could properly highlight it.

I had to change the Author: field for which I used:

sed -i "s/Hook (noreply@blogger.com)/Matija Šuklje/" *.markdown

After that I did not bother with sed and RegExp, but cleaned up all the mess by hand. It did not take too much time, since there were just 14 files to clean up and tag.

Here is a short list of what had to do to fix:

  • remove obsolete newlines and spaces between paragraphs (one empty line is enough, thank you);
  • remove \ in front of every special character that does not need escaping;
  • change all the <span> elements into MarkDown equivalents;
  • make the paragraphs flow again by getting rid of text wrapping at column ~72 (nit-pic really).

After that I also had to fix or remove dead links pointing to pictures that I was not hosting on http://poljane.net/~hook anymore (since neither the domain nor the server exist anymore).

Internet Archive to Pelican

Using the importer

Using pelican-import to migrate content from the Internet Archive was pretty much the same as from BlogSpot. I just used a few occurrences of RSS files that were archived there.

The only notable difference was one bizarre occurrence of a blog post included all other posts from the RSS feed, except the post it should have.

The rest by hand

For the rest of the content I had to navigate through the Internet Archive copy-paste and edit by hand.

This was the single most tedious grind of the whole migration. Well worth it though, as now I have the full set of my blog posts from 2004 onward in one single place. As well as backing it up.

Modifications over stock install and migrated data

Since my SQL and PHP skills are still very poor, the migration to static HTML generator which is written in a language that I understand3 has given me additional motivation to more then just general house-keeping.

General cleanup

After all the migration was done, some clean-ups were in order to improve styling and make it more coherent.

Much of this I did by hand, but there are two that I managed to get automated.

In order to change three dots into a proper ellipsis:

sed -ri "s/(\.){3}/…/g" *.markdown

In order to change all the hyphens that were used to separate parenthetical statements, into proper en dashes:

sed -i "s/ - / – /g" *.markdown

New tags and categories

During the past decade I used several different styles of categories and later tags.

With the migration of all these posts into a single blog, I decided to make it more coherent and therefore decided on a combination of both tags and categories.

The categories are 4 in number and that is to stay that way – they are here to divide the blog into main areas:

Anima
Everything relating mainly to life – from deep thoughts to mundane moments.
Ars
Everything relating to art – books, films, music, games &c.
Ius
Everything relating to law – including copyright, patents, criminal law and all other bodies of law.
Tehne
Everything that is of a more technical nature – diverse HowTos, new exciting IT solutions as well as just updates of this blog software.

I chose Latin (and Greek) names for them as they sound more neutral as well as that their alphabetical order also matches the order of technicality – from no to very technical content. The Ars-Tehne dichotomy8 is intentionally present, as it sometimes is a complicated question which aspect wins over.

The tags provide additional information on what the article is about, mostly data that could not be guessed from the title of the post itself. Currently I use just under 30 tags and while a few more are bound to pop up in time, I do not want to have too many.

Needless to say re-tagging and re-categorising all the posts took a lot of time, but I am very happy it is done now.

Modifications on top of the out-of-the box Pelican install

The theme I decided to use is called Elegant and I think rightly so.

On to of that I have added a few plugins to enable some optional features (of the theme), namely:

  • neighbors – to help visitors read next and previous articles;
  • related_posts – to help people navigate related posts;
  • post_stats – I aim at writing understandable articles, so I decided to keep score of that by showing the amount of time an average reader should take to read the article as well as the Flesch-Kincaid index to keep track of the reading ease of my texts;
  • tipue_search – to search through all the articles;
  • extract_toc – as sometimes I write very long articles, it is very useful to have a table of content in the sidebar to navigate;
  • sitemap – could prove useful for robots more then humans;
  • gallery – after ten years I may actually start posting (smaller) galleries as blog posts to accompany certain events;
  • multi_part – for easier navigation of blogs that form parts of a coherent series.

I decided to disable comments for now and maybe add them later on. Possible contenders are Isso and Static Pelican Comments, but more likely I will decide to keep the blog static and move commenting into a service that is more suited for discussions and preferably P2P and anonymous. While StatusNet and PumpIO seem as obvious choices, I am hoping for something more conforming the Social Desktop idea. So far Secure Share and Twister sound intriguing, but let us see what the future brings…

As I actually understand (some) Python, HTML and CSS, I intend to help a little to develop the Elegant theme further, add new features to it and make it W3C compliant.

…maybe I need to add more monkeys ☺

License update

End-November Creative Commons updated their CC licenses (except CC0) to version 4.0, and as main changes lists:

  • a more global license – not having to bother with many national ports is a cool thing, great to see CC settling for a single international version;
  • rights outside the scope of copyright – now normal CC licenses now explicitly include database sui generis rights (e.g. my shaving results DB), also CC 4.0 now explicitly excludes patents and trade marks from the scope of the license;
  • common-sense attribution – attribution is now possible by linking to a separate page for attribution information, which was a common practice already before, but not explicitly mentioned;
  • enabling more anonymity, when desired – while one one hand CC 4.0 added the waiver of moral rights (to the limited extent necessary) to all licenses as well as the no endorsement clause, on the other hand it included some moral rights into the license as an explicit option for the original author/licensor to demand their name to be removed from adaptations or even verbatim reproductions of their work;
  • 30-day window to correct license violations – it is nice to give others a more realistic provision for licensees to comply with the license;
  • increased readability – the CC-BY does not seem any shorter in 4.0 than in 3.0, but the text flows better with the division into (sub)sections and a better solution than all caps for the liability waivers etc.;
  • clarity about adaptations – adaptations and modifications of all CC 4.0 licensed works now both need to be indicated as such; it is made clearer now that adaptations of CC-BY and CC-BY-NC licensed works can be licensed under any licenses as long as they do not prevent remixing the original work; there are also some especial provisions for CC-SA licenses.

Apart from those, by using CC 4.0 the licensor explicitly waives rights to enforce, and grants permission to circumvent, TPM (e.g. so called “DRM”). Especially for musicians the new waiving of rights to collecting societies should be an interesting novelty.

I took the opportunity to read CC-BY 4.0 and update my blog’s license to it.

External helping tools

FCron as a reminder

Not to forget any drafts unattended, I have set up an FCron job to send me an e-mail with a list of all unfinished drafts to my private address.

It is a very easy hack really, but I find it quite useful to keep track of things – find the said fcronjob below:

%midweekly,mailto(matija@suklje.name) * * cd /var/www/matija.suklje.name/content/ && ack "Status: draft"

ownCloud as online editing tool

What I am trying to do is to be able to add, edit and delete content from Pelican from anywhere, so whenever inspiration strikes I can simply take out my phone or open up a web browser and create a rough draft. Basically a make-shift mobile app.

I decided to that the easiest this to do this by accessing my content via WebDAV via ownCloud that runs on the same server.

On a GNU/Linux server this is done very easily by just linking Pelican’s content folder into your ownCloud user’s file system – e.g:

ln -s /var/www/matija.suklje.name/content/ /var/www/owncloud/htdocs/data/hook/files/Blog

In order to have the files writable over WebDAV, they need to have write permission from the user that PHP and web-server are running under – e.g.:

chown -R nginx:nginx /var/www/owncloud/htdocs/data/hook/files/Blog/

As a mobile client I use ownNotes, because it runs on my Nokia N96 and supports MarkDown highlighting out-of-the-box.

All I needed to do in ownNotes is to provide it with my ownCloud log-in credentials and state Blog as the "Remote Folder Name" in the preferences.

But before I can really make use of ownNotes, I have to wait for it to starts using properly managing file-name extensions.

An added bonus is that I the future I could use ownCloud’s web-based text editor to collaboratively write content on Pelican.

An additional added bonus is that the Activity feed of ownCloud keeps a log of when which file changed or was added.

Automate page generation

To have pages constantly automatically generated, there is a option to call pelican --autoreload and I did consider turning it into an init script, but decided against it for two reasons:

  • it consumes too much CPU power just to check for changes;
  • as on my poor ARM server a full (re-)generation of this blog takes about 6 minutes7, I did not want to hammer my system for every time I save a minor change.

What I did instead was to create an fcronjob to (re-)generate the website every night at 3 in the morning (and send a mail to root’s default address), under the condition that there blog posts have either been changed in content or added since yesterday (written in Zsh):

%nightly,mail * 3 cd /var/www/matija.suklje.name && posts=(content/**/*.markdown(Nm-1)); if (( $#posts )) LC_ALL="en_GB.utf8" make html

Update: the above command is changed to use Zsh; for the old sh version, use:

%nightly,mail * 3 cd /var/www/matija.suklje.name && [[ `find content -iname "*.markdown" -mtime -1` != "" ]] && LC_ALL="en_GB.utf8" make html

In order to have the file permissions on the content directory always correct for ownCloud (see above), I changed the Makefile a bit. The relevant changes can be seen below:

html:
    chown -R nginx:nginx $(INPUTDIR)
    $(PELICAN) $(INPUTDIR) -o $(OUTPUTDIR) -s $(CONFFILE) $(PELICANOPTS)

clean:
    [ ! -d $(OUTPUTDIR) ] || rm -rf $(OUTPUTDIR)

regenerate:
    chown -R nginx:nginx $(INPUTDIR)
    $(PELICAN) -r $(INPUTDIR) -o $(OUTPUTDIR) -s $(CONFFILE) $(PELICANOPTS)

Why not Git and hooks?

The answer is quite simple: because I do not need it and it adds another layer of complication.

I know many use Git and its hooks to keep track of changes as well as for backups and for pushing from remote machines onto the server. And that is a very fine way of running it, especially if there are several users committing to it.

But for the following reasons, I do not need it:

  • I already include this page with its MarkDown sources, settings and the HTML output in my standard RSnapshot backup scheme of this server, so no need for that;
  • I want to sometimes draft my posts on my mobile and Git on a touchscreen is just annoying to use;
  • this is a personal blog, so the distributed VCS side is just an overhead really;
  • there is no added benefit to sharing the MarkDown sources online, if all the HTML sources are public anyway.

Statistics

With the 5th incarnation, Hook’s Humble Homepage has come into its teens.

Interesting coincidence that I started writing this blog while I still was at the end of my teens (I am 30 now).

The first archived blog post dates back to 6th of November 2003 and the oldest migrated one to 21st July 2004.

At the time of this writing, this blog consists of 342 posts and 2 static pages.

Kudos

Many people helped me with advice and sometimes even bug fixes to make this migration possible.

Amongst those I would especially like to thank for helping with the migration:

  • the whole Habari community, especially Mike Lietz, Chris Meller, Owen “ringmaster” Winkler and Les Henderson;
  • the whole Pelican community, especially Talha “talha131” Mansoor;
  • NixOS for saving my arse with the migrator and Domen “iElectric” Kožar for showing me how to (ab)use Nix;
  • Kiberpipa for being there for me to vent my frustration.

Last, but not least, I would like to thank the Internet Archive for archiving my old posts (and all the internet in general). It is doing great work at being the archive and library in the digital era, and it is a great time to express your support for it.

hook out → sipping Taylors of Harrogate Yorkshire Gold tea with raw milk and superb-quality chestnut honey1


  1. By a dear friend of mine, Brodul, who I have just noticed uses Pelican as well. 

  2. I had to use this wrapper that saves the content to the server because the export plugin exported the data properly, but due to CPU being overloaded did not deliver it to my browser in tact. If you simply use the export plugin directly, you can omit this step. 

  3. I am still learning Python, but it is the language in which I am already confident enough to commit also code and not just bug reports. 

  4. The proper extension for MarkDown is .markdown and Vim takes that into consideration. Although .md is preferred by GitHub &al. due to its shortness, it is not a unique identifier and historically others have claimed it before MarkDown. 

  5. Well, to be completely honest in practice setting it up was not so trivial, since at the time of this writing emerging Haskell – which is a dependency of the importer – in Gentoo on an ARM machine is not possible (yet). I worked around it by (ab)using a NixOS VM on my laptop. 

  6. Yes, I am well aware you can run Vim and Git on MeeGo Harmattan and I do use it. But Vim on a touchscreen keyboard is not very fun to use for brainstorming. 

  7. At the time of writing this blog includes 342 articles and 2 pages, which took Pelican 361 seconds to generate on my poor little ARM server

  8. “Ars” in Latin and “τέχνη” in Ancient Greek have originally the same meaning of “craft, skill or trade”, but while the former hints more of artisenal, the latter hints more of technical skill (hence the names). 

Share on: TwitterFacebookEmail


Related Posts


Reading Time

~18 min read

Published

Last Updated

Category

Tehne

Tags

Stay in Touch