shocco is a quick-and-dirty, literate-programming-style documentation generator written for and in POSIX shell. It borrows liberally from Docco, the original Q&D literate-programming-style doc generator.
shocco(1)
reads shell scripts and produces annotated source documentation
in HTML format. Comments are formatted with Markdown and presented
alongside syntax highlighted code so as to give an annotation effect. This
page is by default the result of running shocco
against its own source file.
shocco is built with make(1)
and installs under /usr/local
by default:
git clone git://github.com/velwant/shocco.git
cd shocco
make
sudo make install
# or just copy 'shocco' wherever you need it
Once installed, the shocco
program can be used to generate documentation
for a shell script:
shocco shocco.sh
The generated HTML is written to stdout
.
#!/bin/sh
The most important line in any shell program.
set -e
There's a lot of different ways to do usage messages in shell scripts.
This is my favorite: you write the usage message in a comment --
typically right after the shebang line -- BUT, use a special comment prefix
like #/
so that its easy to pull these lines out.
This also illustrates one of shocco's corner features. Only comment lines
padded with a space are considered documentation. A #
followed by any
other character is considered code.
#/ Usage: shocco [<options>] [<source>] #/ Create literate-programming-style documentation for shell scripts. #/ #/ The shocco program reads a shell script from <source> and writes #/ generated documentation in HTML format to stdout. When <source> is #/ '-' or not specified, shocco reads from stdin. #/ #/ Options: #/ -t,--title <title> Specify a custom title (defaults to filename) #/ -r,--rst Treat comments as reStructuredText not Markdown #/ -c,--css Css url or style name for default #/ -d,--debug In debug mode isn't removed temporary dir #/ -h,--help Show this usage text
This is the second part of the usage message technique: grep
yourself
for the usage message comment prefix and then cut off the first few
characters so that everything lines up.
usage () { grep '^#/' <"$0" | cut -c4- }
These are replaced with the full paths to real utilities by the configure/make system.
PYGMENTIZE='@@PYGMENTIZE@@' MARKDOWN='@@MARKDOWN@@' RST2HTML='@@RST2HTML@@' processor="$MARKDOWN" while test $# -gt 0 do case "$1" in -h|--help) usage exit 0 ;; -t|--title) title="$2" shift; shift ;; -c|--css) cssurl="$2" shift; shift ;; -d|--debug) DEBUG="1" shift ;; -r|--rst) processor="$RST2HTML" shift ;; *) break ;; esac done if test $# -gt 1 then echo "$(basename $0): cannot specify multiple input files" 1>&2 echo 1>&2 usage 1>&2 exit 1 fi
Next argument should be the <source>
file. Grab it, and use its basename
as the title if none was given with the -t
option.
file="$1" : ${title:=$(basename "$file")}
WORKDIR
is based dir path for relative addressed CSS stylesheets
: ${WORKDIR:=$(pwd)}
On GNU systems, csplit doesn't elide empty files by default:
CSPLITARGS=$( (csplit --version 2>/dev/null | grep -i gnu >/dev/null) && echo "--elide-empty-files" || true )
We're going to need a markdown
command to run comments through. This can
be Gruber's Markdown.pl
(included in the shocco distribution) or
Discount's super fast markdown(1)
in C. Try to figure out if either are
available and then bail if we can't find anything.
command -v "$MARKDOWN" >/dev/null || { if command -v Markdown.pl >/dev/null then MARKDOWN='Markdown.pl' elif test -f "$(dirname $(readlink $0))/Markdown.pl" then MARKDOWN="$(dirname $(readlink $0))/Markdown.pl" else echo "$(basename $0): markdown command not found." 1>&2 exit 1 fi }
Check that Pygments is installed for syntax highlighting.
This is a fairly hefty prerequisite. Eventually, I'd like to fallback
on a simple non-highlighting preformatter when Pygments isn't available. For
now, just bail out if we can't find the pygmentize
program.
command -v "$PYGMENTIZE" >/dev/null || { echo "$(basename $0): pygmentize command not found." 1>&2 exit 1 }
Make sure we have a TMPDIR
set. The :=
parameter expansion assigns
the value if TMPDIR
is unset or null.
: ${TMPDIR:=/tmp}
Create a temporary directory for doing work. Use mktemp(1)
if
available; but, since mktemp(1)
is not POSIX specified, fallback on naive
(and insecure) temp dir generation using the program's basename and pid.
: ${WORK:=$( if command -v mktemp 1>/dev/null 2>&1 then mktemp -d "$TMPDIR/$(basename $0).XXXXXXXXXX" else dir="$TMPDIR/$(basename $0).$$" mkdir "$dir" echo "$dir" fi )}
We want to be absolutely sure we're not going to do something stupid like
use .
or /
as a work dir. Better safe than sorry.
test -z "$WORK" -o "$WORK" = '/' && { echo "$(basename $0): could not create a temp work dir." exit 1 }
We're about to create a ton of shit under our $WORK
directory. Register
an EXIT
trap that cleans everything up. This guarantees we don't leave
anything hanging around unless we're killed with a SIGKILL
.
[ -z ${DEBUG} ] && trap "rm -rf $WORK" 0 || trap "echo $WORK >> /dev/stderr" 0
Start out by applying some light preformatting to the <source>
file to
make the code and doc formatting phases a bit easier. The result of this
pipeline is written to a temp file under the $WORK
directory so we can
take a few passes over it.
Get a pipeline going with the <source>
data. We write a single blank
line at the end of the file to make sure we have an equal number of code/comment
pairs.
(cat "$file" && printf "\n\n# \n\n") |
We want the shebang line and any code preceding the first comment to appear as the first code block. This inverts the normal flow of things. Usually, we have comment text followed by code; in this case, we have code followed by comment text.
Read the first code and docs headers and flip them so the first docs block comes before the first code block.
( lineno=0 codebuf=;codehead= docsbuf=;docshead= while read -r line do
Issue a warning if the first line of the script is not a shebang line. This can screw things up and wreck our attempt at flip-flopping the two headings.
lineno=$(( $lineno + 1 )) test $lineno = 1 && ! expr "$line" : "#!.*" >/dev/null && echo "$(basename $0): ${file}:1 [warn] shebang! line missing." 1>&2
Accumulate comment lines into $docsbuf
and code lines into
$codebuf
. Only lines matching /#(?: |$)/
are considered doc
lines.
if expr "$line" : '# ' >/dev/null || test "$line" = "#" then docsbuf="$docsbuf$line " else codebuf="$codebuf$line " fi
If we have stuff in both $docsbuf
and $codebuf
, it means
we're at some kind of boundary. If $codehead
isn't set, we're at
the first comment/doc line, so store the buffer to $codehead
and
keep going. If $codehead
is set, we've crossed into another code
block and are ready to output both blocks and then straight pipe
everything by exec
'ing cat
.
if test -n "$docsbuf" -a -n "$codebuf" then if test -n "$codehead" then docshead="$docsbuf" docsbuf="" printf "%s" "$docshead" printf "%s" "$codehead" echo "$line" exec cat else codehead="$codebuf" codebuf= fi fi done
We made it to the end of the file without a single comment line, or there was only a single comment block ending the file. Output our docsbuf or a fake comment and then the codebuf or codehead.
echo "${docsbuf:-#}" echo "${codebuf:-"$codehead"}" ) |
Remove comment leader text from all comment lines. Then prefix all
comment lines with "DOCS" and interpreted / code lines with "CODE".
The stream text might look like this after moving through the sed
filters:
CODE #!/bin/sh
CODE #/ Usage: shocco <file>
DOCS Docco for and in POSIX shell.
CODE
CODE PATH="/bin:/usr/bin"
CODE
DOCS Start by numbering all lines in the input file...
...
Once we pass through sed
, save this off in our work directory so
we can take a few passes over it.
sed -n ' s/^/:/ s/^:[ ]\{0,\}# /DOCS /p s/^:[ ]\{0,\}#$/DOCS /p s/^:/CODE /p ' > "$WORK/raw"
Now that we've read and formatted our input file for further parsing, change into the work directory. The program will finish up in there.
cd "$WORK"
Start a pipeline going on our preformatted input. Replace all CODE lines with entirely blank lines. We're not interested in code right now, other than knowing where comments end and code begins and code begins and comments end.
sed 's/^CODE.*//' < raw |
Now squeeze multiple blank lines into a single blank line.
TODO: cat -s
is not POSIX and doesn't squeeze lines on BSD. Use
the sed line squeezing code mentioned in the POSIX cat(1)
manual page
instead.
cat -s |
At this point in the pipeline, our stream text looks something like this:
DOCS Now that we've read and formatted ...
DOCS change into the work directory. The rest ...
DOCS in there.
DOCS First Pass: Comment Formatting
DOCS ------------------------------
Blank lines represent code segments. We want to replace all blank lines with a dividing marker and remove the "DOCS" prefix from docs lines.
sed 's/^$/DOCS \ **DIVIDER**\ DOCS /' | sed 's/^DOCS //' |
Save a copy in case we need to debug issues with $processor
tee "$WORK/input" |
The current stream text is suitable for input to markdown(1)
. It takes
our doc text with embedded DIVIDER
s and outputs HTML.
$processor |
Now this where shit starts to get a little crazy. We use csplit(1)
to
split the HTML into a bunch of individual files. The files are named
as docs0000
, docs0001
, docs0002
, ... Each file includes a single
doc section. These files will sit here while we take a similar pass over
the source code.
( csplit -sk \ $CSPLITARGS \ -f docs \ -n 4 \ - '/<strong>DIVIDER<\/strong>/' '{9999}' \ 2>/dev/null || true )
This is exactly like the first pass but we're focusing on code instead of comments. We use the same basic technique to separate the two and isolate the code blocks.
Get another pipeline going on our performatted input file. Replace DOCS lines with blank lines.
sed 's/^DOCS.*//' < raw |
Squeeze multiple blank lines into a single blank line.
cat -s |
Replace blank lines with a DIVIDER
marker and remove prefix
from CODE
lines.
sed ' s/^$/# DIVIDER/ s/^CODE //' |
Now pass the code through pygmentize
for syntax highlighting. We tell it
the the input is sh
and that we want HTML output.
$PYGMENTIZE -l sh -f html -O encoding=utf8 |
Post filter the pygments output to remove partial <pre>
blocks. We add
these back in at each section when we build the output document.
sed ' s/<div class="highlight"><pre>// s/^<\/pre><\/div>//' |
Again with the csplit(1)
. Each code section is written to a separate
file, this time with a codeXXX
prefix. There should be the same number
of codeXXX
files as there are docsXXX
files.
( DIVIDER='/<span class="c[1|a-zA-Z]*"># DIVIDER</span>/' csplit -sk \ $CSPLITARGS \ -f code \ -n 4 - \ "$DIVIDER" '{9999}' \ 2>/dev/null || true )
At this point, we have separate files for each docs section and separate files for each code section.
Create a function for apply the standard Docco HTML layout, using jashkenas's gorgeous CSS for styles. Wrapping the layout in a function lets us apply it elsewhere simply by piping in a body. Layout function if template for target HTML layout.
Original version shocco used only classic CSS for Docco, from jashkenas's repository.
Now you can select from original sylesheets:
parallel
~$ shocco --css "linear"
Or use stylesheet own provenience from net by URL
~$ shocco --css "http://example.com/style.css"
Or include content from local stylesheet:
~$ shocco --css "./shocco.ccs"
And select by name any color style for pygmentize. See list supported
styles in code (pygmentize -L
)
It is example, as do include CSS code for emacs
scheme:
~$ shocco --css "emacs"
This page is by default generated as linear scheme without CSS styles.
But using CSS from file shocco.css
, you can do parallel layout.
For example see index page git repository. It was generated as:
~$ shocco --css "./shocco.css emacs" ./shocco.sh > index.html
csscode=""" #/ CSS for all body { font-family: sans-serif; } """ for i in ${cssurl} do case $i in classic|linear|parallel) csscode="""$csscode@import url('http://jashkenas.github.io/docco/resources/$i/docco.css'); """ ;; manni|igor|lovelace|xcode|vim|\ autum|abap|vs|rrt|native|perldoc|\ borland|tango|emacs|friendly|\ monokai|paraiso-dark|colorful|\ murphy|bw|pastie|rainbow_dash|\ algol_nu|paraiso-light|trac|default|algol|fruity) csscode="$csscode$(${PYGMENTIZE} -S $i -f html)" ;; http*) csscode="""$csscode@import url($i); """ ;; *) case "${i}" in /*) if [ -f "${i}" ] ; then csscode="$csscode$(cat ${i})" fi ;; *) if [ -f "${WORKDIR}/${i}" ] ; then stylesheet=$(cat "${WORKDIR}/${i}") csscode="$csscode$stylesheet" else echo "Unknown css parameter ${i}" >> /dev/stderr fi ;; esac ;; esac done layout () { cat <<HTML <!DOCTYPE html> <html> <head> <meta http-equiv='content-type' content='text/html;charset=utf-8'> <title>$1</title> <style> ${csscode} </style> </head> <body> <div id=container> <div id=background></div> <div class=header><h1>$1</h1></div> <div class=docs>$(cat)</div> </div> </body> </html> HTML }
Alright, we have separate files for each docs section and separate files for each code section. We've defined a function to wrap the results in the standard layout. All that's left to do now is put everything back together.
Before starting the pipeline, decide the order in which to present the
files. If code0000
is empty, it should appear first so the remaining
files are presented docs0000
, code0001
, docs0001
, and so on. If
code0000
is not empty, docs0000
should appear first so the files
are presented docs0000
, code0000
, docs0001
, code0001
and so on.
Ultimately, this means that if code0000
is empty, the -r
option
should not be provided with the final -k
option group to sort
(1) in
the pipeline below.
if stat -c"%s" /dev/null >/dev/null 2>/dev/null ; then
GNU stat
[ "$(stat -c"%s" "code0000")" = 0 ] && sortopt="" || sortopt="r" else
BSD stat
[ "$(stat -f"%z" "code0000")" = 0 ] && sortopt="" || sortopt="r" fi
Start the pipeline with a simple list of split out temp filename. One file per line.
ls -1 docs[0-9]* code[0-9]* 2>/dev/null |
Now sort the list of files by the number first and then by the type. The
list will look something like this when sort(1)
is done with it:
docs0000
code0000
docs0001
code0001
docs0002
code0002
...
sort -n -k"1.5" -k"1.1$sortopt" |
And if we pass those files to cat(1)
in that order, it concatenates them
in exactly the way we need. xargs(1)
reads from stdin
and passes each
line of input as a separate argument to the program given.
We could also have written this as:
cat $(ls -1 docs* code* | sort -n -k1.5 -k1.1r)
I like to keep things to a simple flat pipeline when possible, hence the
xargs
approach.
xargs cat |
Run a quick substitution on the embedded dividers to turn them into table
rows and cells. This also wraps each code block in a <div class=highlight>
so that the CSS kicks in properly.
{ DOCSDIVIDER='<p><strong>DIVIDER</strong></p>' DOCSREPLACE='</pre></div><div class=break></div><div class=docs>' CODEDIVIDER='<span class="c[1|a-zA-Z]*"># DIVIDER</span>' CODEREPLACE='</div><div class=code><pre>' sed " s@${DOCSDIVIDER}@${DOCSREPLACE}@ s@${CODEDIVIDER}@${CODEREPLACE}@ " } |
Pipe our recombined HTML into the layout and let it write the result to
stdout
.
layout "$title"
shocco is the third tool in a growing family of quick-and-dirty, literate-programming-style documentation generators:
If you like this sort of thing, you may also find interesting Knuth's massive body of work on literate programming:
Copyright (C) Ryan Tomayko
This is Free Software distributed under the MIT license.
: