cvs commit: www/test generalnews.txt search.html website-bottom.html

Seth W. Klein sk at
Fri Aug 8 08:48:41 PDT 2003

On Thu, Aug 07, 2003 at 11:19:02PM -0400, Simon Comeau Martel wrote:
> On Thu, 07 Aug 2003 19:48:22 -0400
> "Seth W. Klein" <sk at> wrote:
> > > > sed -i 's/\([.,;]\) /\1\n/g'
> > > 
> > > IMO inserting \n after each coma is not a really good idea...
> [....]
> IMO, we don't want lines with only four words.

Ahh, but we do; see below.

> > However... that code most definitely does not insert a newline after
> > each comma--it only does commas that have a space after them.
> True, but AFAIK, there is always a space after a coma.

I actually listed the most common exception to that in a part you
snipped. It is: comma-quote, as in, "Why yes," she said. Numbers, like
10,000, are another example.

> (I got your point that the space is not left at the beginning of the new line)

I never made that point, although it is quite true.

> I think what we want is something that act like the wrapper of our MUA, and AFAICS they don't care about the notions of "phrase" or "punctuation".

That's word wrapping which you want in the rendered output (and you'd
ideally like a better algorithm than simple word wrap, see TeX for an
example). But if you do that with the source, adding or removing a
single word can reflow the entire paragraph which makes for terrible
diffs. The target line length is no more than 40 to 50 or maybe 60
characters which leaves room for additions. Shorter is better than
longer because shorter is less likely to require breaking a line and
including, in the diff, changes that haven't changed.

Seth W. Klein

P.S. When sending to lists where code is often exchanged, such as this
one, it is very helpful if you set your mail client to wrap outgoing
mail at 72 characters so those clients which respect the line breaks in
the original message can format replies correctly.
sk at               

More information about the website mailing list