No USB? No problem.

How to write an open source bit-bang low-speed USB stack running on a sub-$1 Cortex M0+

If you suspend your transcription on, please add a timestamp below to indicate how far you progressed! This will help others to resume your work!

Please do not press “publish” on to save your progress, use “save draft” instead. Only press “publish” when you're done with quality control.

Video duration
How to get USB running on an ARM microcontroller that has no built in USB hardware. We'll cover electrical requirements, pin assignments, and microcontroller considerations, then move all the way up the stack to creating a bidirectional USB HID communications layer entirely in software.

USB is amazing. It's hot-pluggable, auto-negotiating, and reasonably fast. It's robust, capable of supplying power, and works cross-platform. It lives up to the “Universal” claim: your PC definitely has USB, but it may not have TTL Serial, I2C, or SPI available. Hardware USB support is available in all manner of embedded microcontrollers. However it's not available on all microcontrollers, and integrating a hardware USB PHY can double the cost of a low-end microcontroller. This problem is particularly acute in the sub-$1 microcontrollers: a companion USB PHY chip would typically cost more than the microcontroller (example: the MAX3420E USB-to-SPI adapter costs around $5), so your only option for USB is to get your hands dirty and bit bang the missing protocol.

This talk describes the implementation of a new bitbanged USB stack, starting with a primer on the USB PHY layer and continuing up the stack, concluding with "Palawan", a feature-complete open-source bitbanged USB Low Speed stack available for use on microcontrollers priced for under a dollar. We'll go over requirements for getting USB to work, as well as talking about USB timing, packet order, and how to integrate everything together.

Unlike other bitbang USB implementations such as V-USB and LemcUSB, Palawan makes fewer assumptions about GPIO layout. With Palawan, USB's D+ and D- signals can be on different GPIO banks, and need not be consecutive. By doing so, more pins are available to the user, making it easier to use with devices that have special restrictions on what pins can do what. The only requirements are that both GPIO pins can be both inputs and push-pull outputs, and that at least one pin can be used as an interrupt.

Palawan also includes a USB HID firmware update mechanism to allow for updates to be installed even on platforms that normally require USB drivers.

As a protocol, USB comes in multiple speeds. The base speeds are called Full Speed and Low Speed -- FS and LS respectively. FS runs at 12 Mbps, and LS runs at 1.5 Mbps. LS is more restricted in scope than FS. It limits packet data payload size to 8 bytes (down from 64), and only allows Control or Interrupt endpoints (so no Bulk or Isochronous endpoints). While it's true that this limits the total possible features we can implement, it means that the job of implementing them in software becomes simpler. Limiting communications to 8-bytes of payload data also significantly lowers memory requirements.

The core USB PHY layer consists of two functions: USBPhyRead() and USBPhyWrite(). These functions transparently take care of bit stuffing and unstuffing, where long runs of data have a transition period inserted. They also take care of synchronizing reception to the incoming signal, as well as interpreting SE0 end sequences, recognizing USB keepalive packets, and adding the USB SE0 footer. This particular implementation takes care to ensure incoming packets are presented in the correct endianness, as USB packets are transmitted with the most significant byte first.

Since the PHY code is written using cycle-counting, it must be run from memory that is cycle-accurate. The Kinetis parts we used for testing have variable-cycle flash, so we must first copy the data into RAM and execute from there. Fortunately, gcc makes it easy to put executable code in the .data section, and automatically generates calls to RAM.

The core of the USB PHY layer is written in Thumb2 assembly for an ARM Cortex M0+ using ARMv6m. This is an extremely limited subset of ARM code that removes lots of fun stuff like conditional execution, different source and destination registers in opcodes, as well as DSP instructions. As a tradeoff, most instructions complete in one cycle, with the notable exceptions of branches (which are two cycles if taken) and loads/stores (which are two cycles unless it involves single-cycle IO). USB is 1.5 Mbit/s, and at 48 MHz that gives us 32 cycles to write the data out two ports, calculate bit [un]stuffing, check for end-of-packet, and load the next chunk of data for writing.

The the USB PHY layer makes the following assumptions:

+ The controller is a 48 MHz Cortex M0+ with associated two-stage pipeline
+ GPIO is single-cycle access (sometimes referred to as Fast GPIO or FGPIO)
+ GPIO has separate "Set Value" and "Clear Value" banks.
+ GPIO pin direction register is 1 for output, 0 for input
+ Code is executing from single-cycle access memory, meaning it may need to execute from RAM

Despite these limitations, this code has been ported to two different Freescale/NXP Kinetis parts under a variety of operating systems. These assumptions aren't terribly restrictive, meaning this core could easily be ported to other M0+ implementations.

Other bit-banged USB implementations make assumptions that were not useful for our implementation. V-USB impressively works on an AVR microcontroller across a range of frequencies, but it is the wrong architecture and uses special timer modes unavailable on ARM. LemcUSB is conceptually similar to Palawan and is available for other M0+ chips, and in fact can run at a lower clock speed of 24 MHz. However, LemcUSB requires that D+ and D- be on a GPIO bank's pins 0 and 1 respectively, which is not available on all chips, or may conflict with the SWD pins. Additionally, the M0+ ISA has no instruction for reversing word order, so LemcUSB's low-level PHY functions return data reversed. Palawan takes care to load bits in the correct order, saving a step when examining the packet.

Our sample implementation is accompanied by a bootloader that provides a USB HID communication. This allows for driver-free firmware updates even on Windows, which normally requires a signed driver installation. This USB HID code can act as a keyboard, but is also bidirectional, and is capable of allowing for firmware upload to the device. While there are bootloader HID implementations from companies such as NXP and Microchip, we are unaware of any general-purpose open-source USB HID bootloader created with the intention of providing firmware updates.

Talk ID
Saal G
4 p.m.
Hardware & Making
Type of
Sean "xobs" Cross
Talk Slug & media link

Talk & Speaker speed statistics

Very rough underestimation:
138.3 wpm
912.9 spm
100.0% Checking done100.0%
0.0% Syncing done0.0%
0.0% Transcribing done0.0%
0.0% Nothing done yet0.0%

Work on this video on Amara!

Talk & Speaker speed statistics with word clouds

Whole talk:
138.3 wpm
912.9 spm