Streamlining a Git–Markdown writing process that’s just social enough
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
🌲🍄 b7fb810fcf Reduce hr size 2 years ago
custom Minor edits 2 years ago
dreams Add cringey early sites 2 years ago
ismism Add cringey early sites 2 years ago
pictures Merge Git repos 2 years ago
.gitignore I should seriously revise this 2 years ago
license.md .gitignore fix 2 years ago
readme.md Reduce hr size 2 years ago

readme.md

Introduction

Generating PDFs and HTML from Markdown always seems to break. I want a Git repo for each project and to not fiddle with special characters.

After a decade of experimenting with web publishing, all I ever wanted was a GitHub clone with serif fonts. Please see dreams/ for a college web trip and ismism/ for would-be anarcho-satire.

Then I stumbled on a paradigm so elegant that it's transparent to the existing publishing landscape and to itself. I'm trying to say that the projects neither depend on a website to store useful metadata (changelog); nor do they require a CMS or static site generator to be published, read, and updated; nor must they use any particular naming scheme or folder structure on their own (but static site generators often require a structure).

This paper describes a publishing model that allows the author to keep projects as a collection of folders, whose contents and metadata are publishable on any platform that supports Git repos.

This is not the only version of the paper. Previously I incompletely described a system based on static file display. This version is significantly reworked and much of the old material is in an appendix.

Reasoning

When you have a body of work, so many files and folders, how do you keep it? What's the shortest path from an idea, its computer representation, and the publication? How easy is it for others to provide substantive feedback, and how quickly can you propagate errata to publications? Let's not forget the data sharing question!

The Git portfolio on disk

The author's work should be a self-contained collection of folders, easy to back up and seamlessly move from one computer to the next.

Git is an ideal tool for this because the project folder contains the changelog. The configuration allows for pushing and pulling changes to multiple places.

Many different and extensible web interfaces can display the metadata. Those using a relational database backend can be easily indexed and searched.

Meanwhile the projects, complete with selectively unpublished private files, live in pjc.is/, for example.

Git workflow diagram from A Visual Git Reference

Federated collaborative publishing

This publishing model turns the author into a federated node in a network. The node can be based on the platform of his choice, pushing authenticated copies to other platforms.

For example, my platform of choice is pjc.is and I selectively publish works elsewhere. Only my best DIYbio work goes on the BosLab's GitHub, and some of the BioTorrents.de work appears on Oppaitime's Git, for a non-humble example.

With this and OAuth2-based account creation via Facebook, GitHub, and Google, it's possible for someone to

  • register through GitHub and use the social features here with their GitHub account,
  • find the work mirrored on GitHub and use the same social features there, and
  • clone the work and import it into a new platform with the possibility of further federation.

Note that I'm describing ways to directly interact with the published work itself. Email, Slack, social media, and conversation are interactions about the work.

The author must interpret and use spoken and written critique on the reader's behalf, instead of the reader himself directly publishing proposed changes to the work.

Extending the portfolio's reach

This is the power inherent in the model: the flexibility to base yourself anywhere online and mesh with other nodes as appropriate.

I strongly believe in personal web sites. Incidentally, a very common example in articles "lamenting the old, quirky web," alive and well here, is personal Anglo-Saxon poetry translations.

This paper's bulk describes my process of setting up a self-hosted Gitea instance, how the project files can interact with other software like MkDocs, and how the Git repos can support extra metadata for software like cgit.

The self-hosted and customizable Gitea, with search and OAuth2 registration, leaves me free to develop an aware and dynamic portfolio tailored to my specifications.

Protocol

System profile

$ uname -a
OpenBSD asgarth.pjc.is 6.6 GENERIC#64 amd64

$ git --version
git version 2.25.1

$ gitea -v
Gitea version 1.11.3 built with go1.13.7

$ nginx -v
nginx version: nginx/1.16.1

$ postgres -V
postgres (PostgreSQL) 12.2

Collecting the work

The sad truth is that an individual equillibrium between writing history and style takes time to develop. My own work was variously formatted in plain text, HTML, and Latex before I came to Markdown, and my requirements are unique to my genres.

I made templates at me/templates and designed them for simplicity. Mort Yao's blog post influenced the templates. The goal is to have no formatting or metadata in the documents, but still have those things.

Tools like Pandoc tended to produce suboptimal Markdown from Latex that I edited by hand. The vi commands %s/\n/ /gc and %s/\. /\.\r/gc help prepare it for Git. One sentence per line.

Pandoc's demos page has all the relevant commands. To convert a typical paper and biblio you might do:

# latex → markdown
pandoc --filter pandoc-citeproc --bibliography=biblio.bib -s paper.tex -o paper.md
pandoc-citeproc --bib2yaml biblio.bib > biblio.yaml

Special character hell

I've encountered too many rendering inconsistencies to suggest typing anything but the intended Unicode character. For example, MkDocs converts -- to an en dash but Gitea doesn't, and both convert ... to an ellipsis.

My process is that whenever I encounter a new formatting inconsistency, I replace the proper Unicode character in each project. This gradually removes ambiguity from the writing and ensures the Markdown source itself is decently typeset.

The documents should be equally legible on their own, without a platform. Quotation marks, being obnoxious, are the only exception to this rule.

Nginx and PostgreSQL

Nginx. Gitea needs a web server and a database. Nginx acts as a TLS reverse proxy. The index redirects to my profile stars. This communicates the site's portfolio intent and provides a curated list of bests.

All topology points (repos, users, and organizations) are one click away, and extra features like search and profile settings are no more than two clicks away. Please see the Nginx config below.

# Take note of http://wiki.nginx.org/Pitfalls
server {
	listen 443 ssl http2;
	listen [::]:443 ssl http2;

	server_name pjc.is www.pjc.is;

	ssl_certificate     /etc/ssl/pjc.is.fullchain.pem;
	ssl_certificate_key /etc/ssl/private/pjc.is.key;

	access_log off;
	error_log  /var/www/logs/pjc.is-error.log;

	# https://www.nginx.com/blog/creating-nginx-rewrite-rules/
	rewrite ^/$ https://pjc.is/me?tab=stars permanent;

	# https://nginx.org/en/docs/http/ngx_http_proxy_module.html
	location / {
		proxy_pass       http://localhost:3000;
		proxy_set_header Host      $host;
		proxy_set_header X-Real-IP $remote_addr;
	}
}

The certificates come from EFF's Certbot and the secure headers in conf.d/tls_params come from Cipherli.st (down since 2020-03-28) and OWASP. Please see the secure headers below and note the deprecated and optional features.

# https://cipherli.st
ssl_protocols TLSv1.3 TLSv1.2; # Requires nginx >= 1.13.0 else use TLSv1.2
ssl_prefer_server_ciphers on;
ssl_dhparam /etc/ssl/dhparam.pem; # openssl dhparam -out /etc/ssl/dhparam.pem 4096
ssl_ciphers EECDH+AESGCM:EDH+AESGCM;
ssl_ecdh_curve secp384r1; # Requires nginx >= 1.1.0
ssl_session_timeout 10m;
ssl_session_cache shared:SSL:10m;
ssl_session_tickets off; # Requires nginx >= 1.5.9
ssl_stapling on; # Requires nginx >= 1.3.7
ssl_stapling_verify on; # Requires nginx => 1.3.7
resolver 208.67.222.222 208.67.220.220 valid=300s;
resolver_timeout 5s;
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload";
add_header X-Frame-Options "sameorigin";
add_header X-Content-Type-Options "nosniff";
add_header X-XSS-Protection "1; mode=block";

# https://owasp.org/www-project-secure-headers/
#add_header Public-Key-Pins "pin-sha256=''; pin-sha256=''; report-uri='https://pjc.is'; max-age=10000; includeSubDomains"; # deprecated
add_header Content-Security-Policy "script-src 'self' 'unsafe-eval' 'unsafe-inline'";
add_header X-Permitted-Cross-Domain-Policies "none";
add_header Referrer-Policy "no-referrer";
add_header Expect-CT "max-age=86400, enforce, report-uri='https://pjc.is'";
add_header Feature-Policy "vibrate 'none'; geolocation 'none'";

# /etc/nginx/nginx.conf
server_tokens off;

PostgreSQL. OpenBSD's default documentation is sufficient. Please see exerpts of /usr/local/share/doc/pkg-readmes/postgresql-server below with comments where I deviated around the TLS certificate generation.

$OpenBSD: README-server,v 1.29 2020/02/12 13:20:34 sthen Exp $

If you are installing PostgreSQL for the first time, you have to create
a default database first.  In the following example we install a database
in /var/postgresql/data with a dba account 'postgres' and scram-sha-256
authentication. We will be prompted for a password to protect the dba account:

       # su - _postgresql
       $ mkdir /var/postgresql/data
       $ initdb -D /var/postgresql/data -U postgres -A scram-sha-256 -E UTF8 -W

                    ----------------------------------------

To allow SSL connections, edit postgresql.conf and enable the
'ssl' keyword, and create keys and certificates:

       # su - _postgresql
       $ cd /var/postgresql/data
       $ umask 077
       $ openssl genrsa -out server.key 2048 # 4096
       $ openssl req -new -key server.key -out server.csr

Either take the CSR to a Certifying Authority (CA) to sign your
certificate, or self-sign it:

       $ openssl x509 -req -days 365 -in server.csr \ # -days 3650
         -signkey server.key -out server.crt

                    ----------------------------------------

The default sizes in the GENERIC kernel for SysV semaphores are only
just large enough for a database with the default configuration
(max_connections 40) if no other running processes use semaphores.
In other cases you will need to increase the limits. Adding the
following in /etc/sysctl.conf will be reasonable for many systems:

        kern.seminfo.semmni=60
        kern.seminfo.semmns=1024

                    ----------------------------------------

By default, the _postgresql user, and so the postmaster and backend
processes run in the login(1) class of "daemon". On a busy server,
it may be advisable to put the _postgresql user and processes in
their own login(1) class with tuned resources, such as more open
file descriptors (used for network connections as well as files),
possibly more memory, etc.

For example, add this to the login.conf(5) file:

        postgresql:\
                :openfiles=768:\
                :tc=daemon:

Rebuild the login.conf.db file if necessary:

        # [ -f /etc/login.conf.db ] && cap_mkdb /etc/login.conf

I also completely disabled TCP commuication with listen_addresses = ''. The Unix socket lives in /tmp and is extremely easy to work with.

Git, Gitea, GitHub, and GutBub

Gitea was similarly painless to set up, being one large statically-linked binary. The daemon listens on localhost:3000 and communicated with PostgreSQL via socket.

Gitea's config cheat sheet is an essential resource and OpenBSD's defaults settings are sensibly private. Customizing Gitea explains how to change the website functionality and display.

Enabling OAuth2 was also fairly straightforward. The interface documents exactly where to go to add each provider. Then it's a simple matter of generating a Client ID and Secret for Gitea.

The custom settings all live in the _gitea user's home folder /var/gitea, which also includes repo copies and search indexes, and a full shell environment.

The current challenge is to gradually convert the Git forge into a publishing platform. Besides improved typography and navigation, and leveraging the familiar interface, future plans include integrating tables of contents, natural language tools, and ePub exports.

Please see the custom/ folder for the custom code.

A note on sharing data sets

It's rarely possible or desired to couple the data and its publication. Science data sharing is an open problem. The platform BioTorrents.de described at biotorrents/announcement is a nascent solution.

That publication also contains more details about setting up a Unix server for self-hosted projects. Most of the configuration at the BioTorrents.de announcement originates on the server I described here.

Appendix: The MkDocs system that wasn't

System profile

$ uname -a
OpenBSD asgarth.pjc.is 6.5 GENERIC.MP#820 amd64

$ python --version
Python 3.6.8

$ pip --version
pip 19.0.3 from /home/mkdocs/.local/lib/python3.6/site-packages/pip (python 3.6)

$ mkdocs --version
mkdocs, version 1.0.4 from /home/mkdocs/.local/lib/python3.6/site-packages/mkdocs (Python 3.6)

$ which pandoc
which: pandoc: Command not found.

Pandoc workaround and Python

Note that Pandoc is unavailable from OpenBSD's package tools. I compiled the documents on another system such as Debian or Mac. The workaround that uses Haskell's cabal tools is beyond my scope.

I installed the dependencies with doas pkg_add nginx python and made a dedicated MkDocs user with doas adduser. Then installed Pandoc elsewhere with sudo apt install pandoc pandoc-citeproc.

The pip documentation has good install instructions. Note that I needed the --user flag for all pip commands because the MkDocs user ran the daemon.

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py --user
pip install --user mkdocs mkdocs-pandoc # https://pypi.org/search/?q=mkdocs
mkdocs new /var/www/html/pjc.is
cd /var/www/html/pjc.is
mkdocs serve

I ran MkDocs on my laptop or open a SOCKS proxy with ssh -D 1080 pjc.is. I also wrote small rsync scripts to synchronize the website's instances. In any case, updating mkdocs.yml should update the daemon at localhost:8000.

MkDocs and Markdown extensions

The MkDocs documentation is thorough and a full review of the options is beyond my scope. Note to use correct spacing in mkdocs.yml, especially with sub-options like toc.

Some of the Python Markdown extensions may help with academic docs/data writing. I also used a few third party extensions installed with pip.

pip install --user markdown-captions MarkdownSubscript MarkdownSuperscript

todo: Collect all the Markdown extensions and note which ones didn't work right.