`rc`_: hacking notes
====================

.. _rc: ../

``input.c`` cleanup
-------------------

With ``rc-1.7.1`` released, I'm starting to look at wider issues.  Top
of my list is to improve the way *rc* interacts with line editing
libraries.  Partly this is because I want to support the *BSD*
*editline* library.  Partly I want to improve the support for
*readline*, which I have a nagging feeling is still buggy in places.
Partly I want to clean up some of the ugly code in ``input.c``: there
are only three appearances of ``#if READLINE || EDITLINE``, which
isn't too bad, but one of these surrounds part of an ``if / else``
clause in a way that triggers my "*yeuch!*" detectors.

As a first step, and as part of my longer term goal of improving
the modularization of ``rc``, I have created
``input.h``, the interface file for the functions in
``input.c``, now separate from ``rc.h``.
Thus, these functions are only known in those other source files
which actually need them (which is a fairly disparate bunch).
Time permitting, I will eventually remove almost everything from
``rc.h``.

Now, as well as function interfaces, the new ``input.h`` defines a
couple of global variables.  One is ``rcrc``, which is only used to
suppress error messages when ``builtins.c::b_dot()`` is called from
``input.c::doit()`` to source the user's ``.rcrc`` file.  Although it
might seem like good code reuse to call ``b_dot()`` here, I'm
unconvinced.  It means that we have to fake up an ``argv`` array in
``doit()``.  And ``b_dot()`` does things like pushing an exception to
save ``$*``, and making various checks on whether we're interactive or
not: all totally pointless in this case.

So now, ``doit()`` simply opens the ``.rcrc``
file directly (and calls itself to parse it).  This, in my
opinion, tidies up both ``doit()`` and
``b_dot()`` slightly, as well as removing the global
variable.  As a bonus, I changed the code so that only
``ENOENT`` is silently ignored.  If, for example, the
user has no read permission on their ``.rcrc``, they
now get an error message.

Anyone who can find a use for the newly available
``$*`` in ``.rcrc`` will receive the dubious
distinction of it being detailed here!

On further reflection, this code doesn't belong in
``input.c`` anyway.  Unless there's something subtle
going on that I'm missing, it makes more sense in
``main.c``.  This also eliminates another global
variable - ``dashell``.

*2003-09-00*

``rc-1.7.1`` released
---------------------

*2003-07-17*

``history`` bug fixes
---------------------

Such a small program; so many bugs.

*Callum Gibson* sent a patch to fix a core dump in
his previous patch for "stuttering" colons (the core dump was
triggered by specifying more colons than there are possible
substitutions).

While testing that patch, I came across the following
``fork()`` bomb::

    ; > $history
    ; -

The code in ``history`` that skips prior invocations of the
history program itself works most of the time, but if *all* the
entries in the history file are ``history`` itself, it
returns the first one anyway.

While fixing that, I noticed the horrid code in
``history.c::getcommand()``, which would merrily read and write the
character before the allocated ``hist`` buffer.  The least disruptive
change I could make was to add a "sentinel", so these accesses are now
valid.  (It's a rather unusual sentinel: its value is immaterial, just
its location is used.)

*2002-11-27*

``sigaction()`` tidy up
-----------------------

*Jeremy Fitzhardinge* pointed out a couple of
errors around ``sigaction()`` that were reported by the remarkable
`valgrind`_.

.. _valgrind: http://developer.kde.org/~sewardj/

* It is incorrect to call ``sigaction()`` for
  ``SIGKILL`` or ``SIGSTOP``, which was happening in
  ``signal.c::initsignal()``.  Although real Unices silently ignore
  this, we're trying to be correct, right?  Now, is anybody compiling
  *rc* anywhere that lacks ``SIGSTOP``?  Or even ``SIGKILL``?  I doubt
  it, but I wrapped them in ``#ifdef`` anyway.  Hmm...  hmm...  this
  is pretty ugly.  Maybe ``mksignal``, which checks if these signals
  exist anyway, should flag them as not being installable?

* When I added the ``sigaction()`` code, I omitted to set the
  ``sa_mask`` field of the handler to be installed.  Should we
  mask everything, or nothing?  For now, I've taken what seems to be the
  safer route of masking everything,
  i.e. ``sigfillset(&amp;new.sa_mask)``.


While looking at ``signal.c``, and `SUSv3`_, I noticed the curious use
of ``SA_INTERRUPT``.  Notwithstanding that I wrote this code, I
actually have no idea where ``SA_INTERRUPT`` comes from: both *SUSv3*
and *BSD* have ``SA_RESTART`` instead (with the inverted semantics),
and *Linux* comments ``SA_INTERRUPT`` as a "historical no-op".  Memo
to self: must smoke better drugs.

.. _susv3: http://www.unix-systems.org/single_unix_specification/

Since the configury checked for both ``sigaction()`` and
``SA_INTERRUPT`` before enabling the ``sigaction()`` code, we were
using the old ``signal()``, and the expensive test for restartable
syscalls, and the grody code to make syscalls not restart, in many
places where they were not necessary.  For example, on *NetBSD/i386*,
the development version of *rc* runs its configure script in 3.3s as
opposed to 9.3s for ``rc-1.7``.  In addition, the binary is 175 bytes
smaller.  Oh, and the ``configure`` script itself is over 1k smaller.

*2002-08-15*

The globbing of dangling symlinks bug
-------------------------------------

I actually noticed this bug before the release of ``rc-1.7``,
but then forgot about it again.  Darn.



Certain glob patterns fail to match dangling symlinks.  (A dangling
symlink is one whose target doesn't exist, or is inaccessible.)  More
specifically, if the dangling symlink's name matches a fixed part of
the pattern, it is incorrectly ignored.  For example::

    ; mkdir qux
    ; ln -s /nonesuch qux/foo
    ; echo qux/foo*
    qux/foo
    ; echo qux*/foo
    qux*/foo

The bug was, needless to say, the use of ``stat()`` instead of
``lstat()`` in ``glob.c::dmatch()``, so this really was a
one-character fix!  I also added a test to ``trip.rc``, and configury
to ``#define`` away the ``lstat()`` call into a plain ol' ``stat()``
on systems that lack ``lstat()``.  (It's 14 years since I used a
system without ``lstat()``, and that didn't have anything resembling
an ANSI C compiler, but I realise I've had a somewhat sheltered
upbringing.)

You might wonder about the other call to ``stat()`` in
``glob.c::dmatch()``.  This is correct, because at this point
we want to know whether the name in question is a directory or not.

You might also wonder about the reason for the inconsistency shown
above (i.e. ``qux/foo*`` works, but ``qux*/foo``
doesn't).  It's fairly simple: globbing takes each component of the
path name in turn.  If the component we are looking at contains a glob
metacharacter (e.g. ``foo*``) then we ``readdir()``
the relevant directory, and pull out any matches.  This works, even
for dangling symlinks.  On the other hand, if the component we are
looking at contains no metacharacters (e.g. ``foo``), then we
generate the complete pathname and ``lstat()`` it.


*2002-08-14*

``rc-1.7`` released
-------------------

*2002-06-18*

``version`` tidy up
-------------------

*Erik Quanstrom* brought this up; *Paul Haahr* and *Carlo Strozzi*
made useful comments.

I'd made the ``version`` variable far too magical.  I'd added it to
``var.c::varlookup()``, next to ``apids`` and ``status``.  It was also
marked "not for export" in ``hash.c::var_exportable()``.

Initially I planned simply to give it a default value on startup (in
``main.c::main()``), but this means that any descendant *rc* processes
inherit the value, which is not useful.  So, there's a new array of
variable names and flags (``struct nameflag maybeexport[]`` in
``hash.c``).  It contains entries only for ``version`` and ``prompt``
(which seemed like it would benefit from the same treatment).  The
flag is set to ``FALSE`` in ``main.c::assigndefault()``, and to
``TRUE`` in ``var.c::varassign()``.

While I was looking at this, I noticed that ``bqstatus`` and
``status`` were exportable.  I changed them to be not exportable.

*2002-04-03*

``limit`` code tidy up
----------------------

Thanks to *Chris Siebenmann* for pointing out the flaws in the old
code.  The fundamental problem was that ``builtins.c::parselimit()``
used negative values for error returns, but on some systems
``RLIM_INFINITY`` is negative (e.g.  ``-1`` on *Linux*; ``-3`` on
*Solaris*.) The result is that on these systems, any attempt to set a
resource to ``unlimited`` fails::

    ; limit stacksize unlimited
    bad limit

The ``parselimit()`` code has other flaws.  For example, an attempt to
parse ``futhark`` results in a return value of ``-1024``; parsing
``boring`` gives ``-1073741824``.  (In each case, the final character
is taken as a multiplier.) All of these are considered ``bad limit``\s.
And ``1:fifteen`` parses as ``59``!

Clearly, overloading the return value from ``parselimit()`` was a
mistake.  I followed the obvious route, changing ``parselimit()`` into
a predicate that returns ``TRUE`` or ``FALSE`` according to whether it
could parse its argument or not, with the result of the parse returned
in a pointer argument.  Along the way I noticed that the return value
of ``parselimit()`` is stuffed into an ``int`` variable.  This error
doubtless goes back to my autoconfiscation of *rc*, when I introduced
the use of ``rlim_t``.

I also taught *rc* about some new limits.  ``RLIMIT_AS`` is the
*SUSv2* spelling of ``RLIMIT_VMEM``; in *Solaris*, they are synonymous
(despite being documented separately).  *Linux* has both ``RLIMIT_AS``
and ``RLIMIT_RSS``, which apparently do have subtly different
meanings.  *Linux* also adds ``RLIMIT_NPROC``, ``RLIMIT_MEMLOCK``, and
``RLIMIT_LOCKS``.

There are no standard English names for the newer limits.  Here's a
table which shows what I used, and compares it to several other
popular shells.  "N/A" means that the limit is not available in
this operating system; "unimp" means that the limit is not
implemented in this shell.

.. table::
 :class: tableborder

 ============= =========== =========== ============ ============= ======= 
 macro          rc          Linux csh   NetBSD csh   Solaris csh    bash  
 ============= =========== =========== ============ ============= ======= 
 CPU            cputime                                            -t     
 ------------- -------------------------------------------------- ------- 
 FSIZE          filesize                                           -f     
 ------------- -------------------------------------------------- ------- 
 DATA           datasize                                           -d     
 ------------- -------------------------------------------------- ------- 
 STACK          stacksize                                          -s     
 ------------- -------------------------------------------------- ------- 
 CORE           coredumpsize                                       -c     
 ------------- -------------------------------------------------- ------- 
 NOFILE         descriptors             openfiles    descriptors   -n     
 ------------- ----------------------- ------------ ------------- ------- 
 AS (or VMEM)   memoryuse   unimp       N/A          memorysize    -v     
 ------------- ----------- ----------- ------------ ------------- ------- 
 RSS            memoryrss   memoryuse                N/A           -m     
 ------------- ----------- ------------------------ ------------- ------- 
 NPROC          maxproc                              N/A           -u     
 ------------- ------------------------------------ ------------- ------- 
 MEMLOCK        memorylocked                         N/A           -l     
 ------------- ------------------------------------ ------------- ------- 
 LOCKS          filelocks   unimp       N/A          N/A           unimp  
 ============= =========== =========== ============ ============= ======= 

So, at the time of writing, *rc* has one feature that *bash* doesn't!

*2001-11-23*

Large file support
------------------

Thanks to *Scott Schwartz* for pointing out the
need for this, and *Chris Siebenmann* for helping
me with the implementation.  Seems to be trivial, using the
``AC_SYS_LARGEFILE`` macro available in recent versions of
*autoconf*.



There was just one wrinkle: under *Linux* at least, it is vital to define
``_FILE_OFFSET_BITS`` before including (any header file which includes)
``features.h``.  At one point, I had the moral equivalent of this, broken,
code::

    #include <assert.h>
    #define _FILE_OFFSET_BITS 64
    #include <fcntl.h>

*2001-10-25*

``fn prompt`` tailspin
----------------------

Amazingly, I'm not aware of this having been reported as a bug before.
I came across it while doing something completely different.

If there is a *semantic error* in ``fn prompt``, *rc* goes into a
tailspin, and must be killed.  For example::

    ; fn prompt { echo (a b)^(c d e) }
    bad concatenation
    bad concatenation
    ... [ ad infinitum ]

A semantic error is anything that throws an ``eError`` exception.
Considerably more insidious cases can be devised: ``fn prompt { echo
$a^$b }`` may work fine till ``a`` and ``b`` take on unfortunate
values.

My initial fix was to push a new exception on the stack before calling
``fn prompt``, but on reflection I realised that a simple flag
variable achieves the same goal.

*2001-10-05*

The noisy lists bug
-------------------

Again, this isn't exactly a bug, more of a misfeature::

    ; fn x { echo (a b c d e) }
    ; whatis x
    fn x {echo ((((a b) c) d) e)}

This was easy to fix.  Simply keep a state variable inside
``footobar.c::Tconv()`` so that we only print the parentheses once.


You might be wondering why I chose a ``static`` variable
here, but introduced the "alternate form" for the very similar
case of quoting within variable names.  To be honest, I didn't realise
the similarity of the two cases at the time, and just coded them with
whatever came to hand.  *Ex post facto*, it seems to me that
there *is* an important difference: in this case, only the
handling of ``nLappend`` is affected, and the scope of the
``inlist`` variable can thus be made extremely tight.  In the
other case, there is an interaction between ``nVar`` and
``nWord`` processing, so a state variable would need to have
the function scope of ``dollar`` anyway.

*2001-10-03*

The sneaky parens bug
---------------------


On output (``Tconv()``), *rc* would quote a string if and only if it
was quoted on input.  *Wolfgang Zekoll* discovered that it's possible
to sneak a quote-worthy string in by other means: specifically, by
surrounding a variable name with parentheses.  (I suspect there may be
other ways, but I haven't discovered any.) Like this::

    ; (a.b) = foo
    ; fn x { echo $(a.b) }
    ; x
    foo
    ; whatis x
    fn x {echo $a.b}

Note that the last line is wrong: on input this is parsed as
``echo $a^.b``.  This means that the function ``x``
doesn't work in descendant *rc* processes.



The fix is in two parts.  First, *rc* now scans
strings for metacharacters to decide if they need quoting or not.
This is handled by ``quotep()`` (which I put in
``lex.c``, since it accesses ``nw[]`` and
``dnw[]`` which, I claim, should be ``static`` to
that file&mdash;at present they're not, but maybe one day.)  This is
made uglier by the different quoting rules needed for variable names
and other words, which is all down to free carets.  I'm afraid I
introduced an "alternate form" for ``Tconv()``
(i.e. ``fmtprint(... "%#T" ...)``) which turns on the
variable name quoting rules.



Now there is *almost* no need for the distinction between the
``Node`` types ``nWord`` and ``nQword``, so
I abolished it.  (Note that in 7 out of 9 places, they were treated
identically.  The eighth was in ``Tconv()``, which was fixed
as described above.  The ninth we shall come to shortly.)



Removing this distinction tidied up the lexer and parser slightly:
previously, the lexer returned a quoted string as ``'foo``,
which resulted in an unnatural ``do... while`` loop in the
lexer, and a mysterious ``+1`` in the parser.  Unfortunately
(here comes the ninth case) it also broke here documents:
``<<'EOF'`` is a very different thing from
``<<EOF``.  Darn.


So I added a ``q`` field to ``struct Word`` to hold
this information.  The lexer sets ``q`` if the string was
quoted; the only place it is tested is in
``heredoc.c::qdoc()``.

And that's it.  On reflection, the bug could probably have been fixed
simply by resetting ``dollar`` when the lexer hits
``(``.  However, I think the new code is cleaner (and the
compiler agrees: the binary is slightly smaller).  It also means that
*rc* now uses "minimal quoting" on output::

    ; fn x { echo 'foo' }
    ; whatis x
    fn x {echo foo}

Earlier versions of *rc* would have maintained the
quotes on ``'foo'``.

Incidentally, the "hilarious quoting bug", mentioned in
``trip.rc`` comes from here.  In ``tree.c::mk()``, I
added the new member to ``nWord``, but omitted to update the
size of it.  Hilarity ensues: on my development machine, the result
was that alternate members of a list were quoted::

    ; x=('#' '#' '#')
    ; whatis x
    x=('#' # '#')

I don't suppose this particular bug will ever occur again, but if
it does, the new test will catch it.

*2001-10-02*

The Ctrl-A bug
--------------

(Actually it's a misfeature, not a bug.)

This is relatively straightforward.  I stole ``Wconv()`` from *es* to
output a list with ``^A`` and ``^B`` quoting.  The functions are short
enough that I don't think it would be worth trying to make ``Lconv()``
more generic, to handle all possible cases of outputting lists.

I rewrote ``footobar.c::parse_var()`` completely, removing the "Hacky
routine...  [that] scans backwards".  The new code is straightforward,
and I don't believe it's any less efficient than the old: both
basically examine the string twice.

*2001-09-27*

