Refulang Developer Diary 1 – Introduction and Invitation for Collaboration

Introduction

This post serves the purpose of introducing a pet project I have been working on and off since 2011. It is a strongly typed, lazily evaluated programming language that aims to be a hybrid between imperative and functional languages. It is written in C and is still in a very early fledgling state since it never managed to outgrow the status of a pet project of a single developer.

refulang_code.jpg

Until now. Through this and subsequent posts I would like to introduce Refulang to the world and openly invite anyone interested to collaborate with me in the project. Everything is on Github and in a state where developers from around the world can contribute to it by reading the code and opening Pull Requests and issues.

You can get the chance to learn a lot and experiment by contributing to the development of a programming language and have a say in many of its design choices.

Before you say it, yes I am aware of Rust and actively use it and love it. Back when I started this project Rust was at its fledgling stages and I did not know of its existence.

Some History

Refulang started as a silly project of mine, called Refu library where I was keeping a lot of common functionality I was using in all of my C projects. In time I got the idea to start developing a language and Refulang started forming. What used to be Refu library is now rfbase, a library with common functionality used inside the language as a submodule.

I unfortunately never had the time to work full time on the language and so I was always working either late at night or on weekends, so progress has been rather slow on the project. Additionally as it can be with such projects the code is probably rather ugly in some places. Regardless it is now in a state where what is there is very well tested, works and is well organized.

Some details about the language

The language is by no means perfectly defined at the moment. It compiles only on Linux but there are issues tracking porting effort for both macOS and Windows. There is no language specification apart from something I wrote long ago in orgmode but I believe any document at this point in time should be written in Markdown to encourage collaboration. So for now the code is the specification of the language. Still I can give a good description of the language’s current design and goals. Some of the following are not yet implemented.

From the very beginning Refulang aims at being a hybrid between imperative and functional languages but keeping first and foremost in mind the goal of being useable and understandable.

It is a curly braces language with a strong type system based on algebraic data types. It supports generics in the form of type parameters. Naturally it also implements pattern matching in order to deconstruct the algebraic data types. It compiles to LLVM bytecode.

type product_identifier {
    numeric_id:i64 | text_id:string
}

fn process_id(id:product_identifier)
    numeric_id:i64 => print("ID is a number: " numeric_id)
    tid:string => print("ID is ASCII: " tid)

fn main(args) -> u32
{
    id1 = product_identifier(5642)
    id2 = product_identifier("FF0AAAXYN")
    id3:product_identifier = if args[0] > 10 { "FFXEWQ01" } else { 64321 }

    process_id(id1)
    process_id(id2)
    process_id(id3)

    return 0
}

Above is a simple example with an algebraic data type, product_identifier, being instantiated in different ways and then deconstructed by an implicit match operator as part of the process_id() function.

fn use_array(arr:u64[6]) {
    // arr would only be evaluated when entering this function
    for i in arr {
        print(i)
    }
}

fn foo() {
    x = 0
    // since all values are known at compile time
    // the type of arr will be deduced to u64[6]
    arr = for a in 0:2:10 { x * 2 }

    // other code follows
    // ....
    // ....
    use_array(arr)
}

Above is another example where we can see lazy evaluation of an array using a for expression. The language aims to be lazily evaluated wherever possible in order to also allow for infinite data structures defined from the algebraic data types. I say aims, because this is a part of the language that is not yet implemented.

Furthermore Refu encourages programming to the interface by using typeclasses, a way to guarantee behaviour about objects of a specific type. Typeclasses act much like interfaces act in Java or traits in Rust. They are inspired by Haskell.

Refu programs are organized in modules that encompass specific functionality. Everything in a module is private by default unless explicitly exported. Each module can import objects and functions from other modules. Modules can also have signatures separated from their implementation. That is a module can have a single signature defining the type and interface of the module but also multiple implementations. As an example consider an IO module that implements I/O functionality for Linux, Windows, ARM or even javascript!

The memory model of the language (even though not perfectly defined yet) aims to give freedom to the developer when required but in most cases it will try to act invisibly. The memory model should be designed in such a way that the lifetime of most objects can be determined statically at compile time and proper optimization can occur. Rust performs such optimizations very well but requires the developer to explicitly define lifetimes and ownership via syntactic constructs. This decreases the usability of the language and makes for a much steeper learning curve. Refulang aims for as optimized code as possible without sacrificing ease of use. It’s all about trying to find a golden mean between useability and speed.

Current state of the code

The code is in Github at two different repositories. The main repository contains the entirety of the compiler code and uses rfbase C library as a submodule for functionality that could be easily abstracted for other projects too.

The codebase is organized into five distinct sections that correspond to the stages of the compilation pipeline.

  • Lexer: The lexer of the language which reads in the source and splits it into a number of lexical tokens.
  • Parser: A recursive descent parser that continuously reads in tokens fed to it from the lexer and formulates the Abstract Syntax Tree (AST).
  • Analyzer: The analyzer stage is one of the most important ones. This is where all the typechecking and correctness analysis happens.
  • Intermediate Representation: This is the stage where the RIR (Refu Intermediate Representation) is created. The typechecked code is converted into an intermediate format where both further analysis and conversion to final backend code is much easier.
  • Backend Code Generation: The final stage of compilation where the RIR is converted into backend executable code. The backend code generation is modular so that many different backends could be plugged in but for now the only backend possible is in LLVM.

How you can get involved

Refulang is still at an initial design and implementation level so there are many ways you can contribute.

  • You can contribute to the language design itself by participating in the discussion of how to design certain language features in the Github issues and in gitter.
  • You can read the code and get a better understanding of how the language works and then help write up more of the much needed documentation.
  • You can pick up any of the low hanging fruit issues, develop a solution for them and open a Pull Request. If you are feeling adventurous you can also check any of the other bigger issues.
  • If you have any feedback or comments you can open an issue in Github or come in gitter to discuss.

What makes Refulang exciting?

A language based on an algebraic data type system with lazy evaluation. An intuitive to use module system which can extend into a nice packaging system. A memory model that tries to optimize as much as possible without introducing too many concepts to the user but instead pushing all the work to the compiler and striving to be user friendly.

But first and foremost what makes Refulang exciting for someone at this point in time is its malleability. As a developer reading about, using and contributing to the development of Refulang you get the chance to shape a new programming language and guide its design and development.

What can you get out of it?

Getting involved with the development of Refulang at this point will not require a lot of your time (depending on your level of interest/commitment) and will give you the chance to:

  • Contribute to a new programming language from an early stage, participate in the language design process and have a hand in its creation.
  • Work in an open source project, being able to show what you are doing to everyone around the world.
  • Learn a lot about compilers and language design.
  • Participate in a cool project using the C language.

Conclusion

I hope you enjoyed this small introduction to Refu. It has been an extremely rewarding journey for me working on it so far but I now need help. Please join me in Github or gitter, bring new life to this project and let us together make an exciting new programming language.


About the Author

profile2.png

Lefteris Karapetsas is a passionate developer/tinkerer currently located in Berlin.

After graduating from the University of Tokyo, Lefteris has been developing backend software for various companies including Oracle and Acmepacket. He is an all-around tinkerer who loves to takes things apart and put them back together learning how they work in the process.

His interests include language/compiler design, Artifical Intelligence, Robotics, Systems programming, Distributed Systems and Blockchains. He feels at home with C code and GDB and tries to forward all that energy into the development of Refulang.

He has gained a lot of blockchain expertise by being part of Ethereum as a C++ core developer since its beginnings, having worked on Solidity, the ethash algorithm, the core client and the CI system. He had a hand in the creation of the DAO and in the cleanup after the hack. He is developing Sikorka, a system enabling people to use the Ethereum blockchain out in the real world. At the same time he is working with Brainbot AG as the project manager for Raiden. Raiden is bringing payment channels to Ethereum allowing vast scaling of the protocol by leveraging off-chain transactions.


Twitter: @lefterisjp Github: Lefterisjp contact: lefteris@refu.co

System encryption with passphrase protected usb key

Introduction

The past month a lot of security issues have popped up in my work and personal life most prominent of which was the DAO hack. All these have lead me to think a lot about security and prompted me to learn and utilize new methods I had never used before. All in all I can say that the events of the past few months have made me quite a bit paranoid.

As a result of that I started reading about full system encryption of the root file system and wanted to apply it in a newly installed ArchLinux system. The Archwiki has a very extensive guide on system encryption which can function as a reference to anyone wanting to do the same. One scenario not covered by that guide is how you can encrypt the system with a password protected keyfile located on a USB stick. This is the scenario we are going to cover in this guide.

Encrypting an entire system

We will be using the Device Mapper crypt module in order to encrypt block devices using the Linux Kernel’s crypto API. We assume a very simple setup with 2 partitions. The first partition will be the boot parition and the other is going to be the root partition which we will encrypt.

luks-logo.png

We will use the Logical Volume Manager (LVM) in order to have a flexible root parition logical volume on top of a LUKS encrypted partition. We will essentially be using the LVM on Luks methodology of the Archwiki but with a big change that will allow us to have a passphrase protected keyfile in a USB stick. I will assume you are attempting to install an ArchLinux machine following the wiki and explain the different/additional steps that need to be taken in order to achieve the encryption.

Preparing the disk

For extra safety you can securely wipe the entire disk using Luks as can be seen here. After that is done and depending on whether you have an UEFI motherboard or not create an UEFI or an MBR boot partition. Also create a root partition. To do so you can use parted. Once you have created the partitions you will need to format them. In the examples below we will assume an UEFI partition and that your drive is /dev/sda. Adjust the commands depending on your drive name.

Create the 2 partitions.

(parted) mkpart ESP fat32 1MiB 513MiB
(parted) set 1 boot on
(parted) mkpart primary ext4 513MiB 100%

Format them accordingly. Here we are creating an UEFI boot partition and an ext4 root partition.

mkfs.fat -F32 /dev/sda1
mkfs.ext4 /dev/sda2

Now we can use cryptsetup in order to create the encrypted container on top of the root partition. You can choose a lot of different options for the encryption like the hash algorithm used for key derivation, or the number of iterations to be used for passphrase processing.

cryptsetup luksFormat /dev/sda2

After that you will have to open the container.

cryptsetup open --type luks /dev/sda2 lvm

The decrypted container is now available at /dev/mapper/lvm.

Preparing the logical volumes

Now we are going to create a physical volume on top of the opened LUKS container.

pvcreate /dev/mapper/lvm

Subsequently create a volume group and create the root logical volume for it. You should change VolName with the name you would like your volume group to have.

vgcreate VolName /dev/mapper/lvm
lvcreate -l 100%FREE VolName -n root

Finally you should format the logical volume and mount it.

mkfs.ext4 /dev/mapper/VolName-root
mount /dev/mapper/VolName-root /mnt

Preparing and configuring the boot partition

For our example we have an UEFI boot partition on /dev/sda1. You can always adjust this guide to any other type of boot partition your system may have. Mount the boot partition and continue with the installation procedure up to the point where you deal with initramfs.

mkdir -p /mnt/boot
mount /dev/sda1 /mnt/boot

The bootloader loads the kernel and the initramfs scripts from the boot partition. The new iteration of initramfs is called mkinitcpio and is essentially a very small early userspace environment which loads various kernel modules and sets up all necessary things before handing control over to init.

We can use the already existing encrypt and lvm2 hooks of mkinitcpio. To enable them edit /etc/mkinitcpio.conf and add them in the HOOKS line. They should be added before the filesystems hook. so in essence it should look like this:

HOOKS="... encrypt lvm2 ... filesystems ..."

Run the following in order to create the updated initcpio scripts.

mkinitcpio -p linux

Now you should figure out the UUID of your physical device. You can do so by running:

blkid /dev/sda2

/dev/sda2: UUID="8197c881-160c-465c-a15c-96b59as26157" TYPE="crypto_LUKS" PARTUUID="fe8d1a97-d10b-43c9-a748-972b0af8a09b"

Replace /dev/sda2 with the partition of your root filesystem. Once that is done then you can edit your bootloader to add the following kernel arguments, which will be picked up by the encrypt initcpio module and decrypt your device at boot. If for example you are using systemd-boot then you should edit /boot/loader/entries/entry.conf like so:

title Arch Linux
linux /vmlinuz-linux
initrd /intel-ucode.img
initrd /initramfs-linux.img

options cryptdevice=UUID=8197c881-160c-465c-a15c-96b59as26157:VolName root=/dev/mapper/VolName-root quiet rw

Remember to change VolName to the name of the volume group you created.

Finally make sure to properly populate /etc/fstab so that after decryption the logical root partition is properly mounted at boot:

# /etc/fstab: static file system information
#
# <file system> <dir>   <type>  <options>       <dump>  <pass>
# /dev/mapper/VolName-root
UUID=8197c881-160c-465c-a15c-96b59as26157       /               ext4            rw,relatime,data=ordered        0 1

# /dev/sda1
UUID=0C02-13D4          /boot           vfat            rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro    0 2
/dev/mapper/VolName-root /mnt ext4 defaults,errors=remount-ro 0 2

Once that is done then you can simply reboot and you will be prompted for the passphrase to unlock your encrypted root partition at boot. Congratulations you have now encrypted your root partition!

Using an encrypted USB stick

Having an encrypted partition that decrypts simply by typing a password is nice but if a malicious actor learns of your password then he can without any trouble decrypt your root partition and gain access to your data. A method by which you can make this more difficult is to use a 2-factor authentication by having a USB stick with an encrypted passphrase which will act as a key to decrypt the root partition. As such a malicious actor would need physical access to both your USB stick containing the keyfile and to the password that unlocks it.

prompt.png

Create the keyfile

The simplest way to achieve the 2fa effect is to again use LUKS in order to create an encrypted keyfile inside the USB stick. Let’s assume now that the usb stick is located at /dev/sdc1 and that it uuid is 1193c881-267f-134f-123a-12b34as56357.

We now have to mount the usb stick, create a random keyfile in it and encrypt it with Luks. You can change OurKey with whatever name you would like the decrypted Luks volume of the key to have. Also you should replace /dev/sdb2 with the root partition you want to encrypt with that key.

mkdir -p /mnt/stick
mount /dev/sdc1/ /mnt/stick
dd if=/dev/zero of=/mnt/stick/key.luks count=2057 
cryptsetup --align-payload=1 luksFormat /mnt/stick/key.luks 
cryptsetup luksOpen /mnt/stick/key.luks OurKey
dd if=/dev/urandom of=/dev/mapper/OurKey
cryptsetup luksAddKey /dev/sdb2 /dev/mapper/OurKey

Create an initcpio hook

We will create an initcpio hook so that the bootloader can prompt us for the passphrase and decrypt the encrypted partition during the boot process. The beauty of initcpio is that it’s all simply shell scripts and as such they are quite easy to understand.

First of all you should decide on a name for your hook. I called mine lefcrypt but you can use whichever name you want. To create a hook you need to create 2 files under 2 different directories.

Create /usr/lib/initcpio/install/lefcrypt:

#!/bin/bash

build() {
    # Copied from the encrypt hook install script
    local mod

    add_module loop
    add_module dm-crypt
    if [[ $CRYPTO_MODULES ]]; then
        for mod in $CRYPTO_MODULES; do
            add_module "$mod"
        done
    else
        add_all_modules '/crypto/'
    fi

    add_binary "cryptsetup"
    add_binary "dmsetup"
    add_file "/usr/lib/udev/rules.d/10-dm.rules"
    add_file "/usr/lib/udev/rules.d/13-dm-disk.rules"
    add_file "/usr/lib/udev/rules.d/95-dm-notify.rules"
    add_file "/usr/lib/initcpio/udev/11-dm-initramfs.rules" "/usr/lib/udev/rules.d/11-dm-initramfs.rules"


    add_runscript
}

help() {
    cat <<HELPEOF
This is our custom hook for decrypting a keyfile from a USB stick.
HELPEOF
}

The above essentially prepares the script, states the required modules for the script to run and also provides a help docstring which will appear if you typed mkinitcpio -H lefcrypt.

Also create /usr/lib/initcpio/hooks/lefcrypt:

#!/usr/bin/bash

run_hook() {
    modprobe -a -q dm-crypt >/dev/null 2>&1
    modprobe loop
    [ "${quiet}" = "y" ] && CSQUIET=">/dev/null"

cat << "EOF"

                                          ___.-----.___
                                       .-'. . . . . . .`-.
                                     .'  ` . . . . . .  ' `.
                                   .' ` ` . . . . . . '  '  `.
 .----------------------..--.     / `` ` ` ` _.---._ ' ' ' '  \
|  ,                 `--||--.\   / ` ` ` `.-'_.---._`-.' ' ' ' \
|  `                 ,--||--'/  [\ ` ` `.'.-' ..| ..`-.`.' ' '' \
`-----------------------`'--'  _[/ ` ``/.' \ .. |..  / `.\' ' '  \
   |       |                  / / ` ` // `` \  .| . /' ' \\' ' _  \
   |        \__.---------.___/|\ - ` // `. ` \.---./'' .' \\ - _ - |
   |    _.--' ` `` ` ` ` `  /-| \ -  ` = -`. / ___ \ .'- = ||- = - |
   || .'   `` ``  ``` `` `.' -|||::= `---.__/_/   \ \  _.-'||= _ = |
   \ /  ` ``  ` ` ` `.---' [] | |::  ||[_]    ___  \|-'- = ||- _ - |
   |(O]================-------| |    ||[_]   (O__) ||------||- - - |
   / \ ' ''    ''    `---. [] | |::  ||[_]____     /|-._ = ||- _ - |
   || `._ ' ''  ' ''    ' `. -|||::= ,---'  \ \___/ / - `-.||= _ = |
   |     `--.__ ' ' '''' '__\-| / -  , = -.' \     / `. =  ||- = - |
   |        /  `---------'   \|/ _ ' \\ .''' /`---'\`  `. // - _ - |
   |       |                  \_\ ' ' \\ '  /. .|.. \``  // ` `   /
 .--------------------..--.     [\'' ' \`.'/ .. | .. \ .'/ `  ` `/
|  ,               `--||--.\    [/' ' ' `.`-._ .|. _.-'.' `  ` `/
|  `               ,--||--'/     \ ' ' '  `-._`---'_.-' ` ` ` `/
 `--------------------`'--'       \' ' ' '   .`---'.   ` `  ` /
                                   `.'' '' '. . . . `` `  ` .'
                                     `.'' '. . . . . . ` `.'
                                       `-.___ . . . ___.-'
                                             `-----'

Provide the captain's command authorization code for the USB stick:
EOF

    #obtain the key
    mkdir -p /mnt/usbstick
    resolved=$(resolve_device  /dev/disk/by-uuid/1193c881-267f-134f-123a-12b34as56357)
    mount -t ext4 "$resolved" /mnt/usbstick
    cryptsetup -T 5 luksOpen /mnt/usbstick/key.luks OurKey

    #unlock the root partition
    cryptsetup --key-file /dev/mapper/OurKey luksOpen /dev/disk/by-uuid/8197c881-160c-465c-a15c-96b59as26157 lvm

    #clean up the key
    cryptsetup luksClose OurKey
}

The above is a really simple script which uses the UUID of the usb stick in order to find the key and prompt the user to decrypt it. I could not resist putting a Star Trek reference at the prompt. Apologies :). You should change the UUIDs with your drive’s actual UUIDs and also OurKey with the name you provided for your LUKS encrypted key partition.

In order to use this hook you have to include it in the /etc/mkinitcpio.conf and put it instead of encrypt like so:

HOOKS="... lefcrypt lvm2 ... filesystems ..."

Finally you should create the new initramfs image by issuing mkinitcpio:

mkinitcpio -p linux

==> Building image from preset: /etc/mkinitcpio.d/linux.preset: 'default'
  -> -k /boot/vmlinuz-linux -c /etc/mkinitcpio.conf -g /boot/initramfs-linux.img
==> Starting build: 4.6.3-1-ARCH
  -> Running build hook: [base]
  -> Running build hook: [udev]
  -> Running build hook: [autodetect]
  -> Running build hook: [modconf]
  -> Running build hook: [block]
  -> Running build hook: [lefcrypt]
  -> Running build hook: [lvm2]
  -> Running build hook: [filesystems]
  -> Running build hook: [keyboard]
  -> Running build hook: [fsck]
==> Generating module dependencies
==> Creating gzip-compressed initcpio image: /boot/initramfs-linux.img
==> Image generation successful
==> Building image from preset: /etc/mkinitcpio.d/linux.preset: 'fallback'
  -> -k /boot/vmlinuz-linux -c /etc/mkinitcpio.conf -g /boot/initramfs-linux-fallback.img -S autodetect
==> Starting build: 4.6.3-1-ARCH
  -> Running build hook: [base]
  -> Running build hook: [udev]
  -> Running build hook: [modconf]
  -> Running build hook: [block]
==> WARNING: Possibly missing firmware for module: wd719x
==> WARNING: Possibly missing firmware for module: aic94xx
  -> Running build hook: [lefcrypt]
  -> Running build hook: [lvm2]
  -> Running build hook: [filesystems]
  -> Running build hook: [keyboard]
  -> Running build hook: [fsck]
==> Generating module dependencies
==> Creating gzip-compressed initcpio image: /boot/initramfs-linux-fallback.img
==> Image generation successful

Now once you reboot you can simply input the USB stick and be prompted for the passphrase. Congratulations you now have 2fa in the encryption of your root partition!

Removing the simple passphrase decryption

If you followed this guide step by step then you will still have the option to decrypt the system using the simple passphrase key you created in the first section. If you have confirmed that the USB stick decryption works perfectly then you can safely remove the simple passphrase key.

First check how many keys are used by the encrypted root partition:

cryptsetup luksDump /dev/sda2 | grep BLED

Key Slot 0: ENABLED
Key Slot 1: ENABLED
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED
Key Slot 6: DISABLED
Key Slot 7: DISABLED

You should only see 2. Key slot 0 should be the very first simple passphrase key and Key slot 1 the one we just created on the USB stick. You can generally query a lot of information about the keys as can be seen in the related wiki.

If you have used different passphrases for the USB encrypted stick and for the normal passphrase key then it’s quite easy to remove the key without even specifying the slot.

cryptsetup luksRemoveKey /dev/sda2
Enter LUKS passphrase to be deleted:

If you have used the same password then you have to also specify the slot when removing the key.

cryptsetup luksRemoveKey /dev/sda2 0
Enter any remaining LUKS passphrase:

After this action is complete then the only way to decrypt your root filesystem and gain access to your machine would be by using the key located inside the USB stick.

Conclusion

We have presented a way to use a password protected encrypted key located in a usb stick to decrypt the root filesystem of your computer. This provides us with a lot of security and the ability to perform a 2-factor authentication when booting the system in order to protect our data if the computer ever falls into the hands of a malicious actor.

There are disadvantages to this approach. If a malicious actor ever gains access to both your key and your keyfile it is Game Over. At the same time the boot partition needs to be unencrypted to perform the bootloading process. This can introduce vulnerabilities which an attacker could take advantage of. There are some methods that can be followed in order to secure the unencrypted boot partition, such as having it located on an external drive etc.

The presented method is not perfect, but it provides superior security in comparison to a totally unencrypted system and provides a nice basis from which the curious reader can explore many other methods of disk encryption. I hope you enjoyed this post and please don’t hesitate to leave some comments explaining how you use encryption to protect your data and what kind of improvements you believe can be made in the method presented here.