update-website.mk and mirroring issue

Anderson Lizardo lizardo at linuxfromscratch.org
Tue Nov 23 18:20:54 PST 2004

On Sunday 21 November 2004 14:22, Matthew Burgess wrote:
> Anderson Lizardo wrote:
> > But then I ask: what's the real purpose of the "timestamp" file? I know
> > it allows the mirrors to only run rsync if this file is changed, but does
> > rsync not do this check already?
> My understanding of rsync is that it will only sync *changed* files.  So
> there should be no need for us to keep our own housekeeping information
> for this purpose.  If our site changes, the mirrors' rsync process will
> spot them the next time they run. If the site hasn't changed next time
> they rsync, then nothing gets transferred.

That's what I think so (BTW rsync exits really fast when nothing needs 
update). I suppose Gerard added the timestamp check to avoid unecessary rsync 
runs, but I don't think rsync causes too much load when nothing needs 

I want to check this anyway. Jeremy H.: can you setup that test mirror at 
jenacon.net (I suppose it's down, at least lfs.jenacon.net doesn't work 

> > Another thing: with the current
> > http://linuxfromscratch.org/~gerard/lfs-rsync.sh script used by all
> > mirrors, the ".svn" dir (along with other files, like news-YYYY.txt,
> > templates) are rsync'ed even though they are not necessary on the "final"
> > website. This is avoidable by using the --exclude-from rsync option (see
> > update-website.mk's "run-rsync" rule), and saves some MB of bandwidth
> > (aprox. 12MB of .svn dirs).
> OK, this suggests that they're mirroring a working copy of the SVN
> repos.  I don't like that idea much.  This issue appears on the
> svn-users list every now and again, as people seem to insist on having
> their live site as simply a working copy of the repository.  Why can't
> we use 'svn export' here to get a clean, unversioned, copy of a
> particular revision of the repository?  This would negate having to have
> a .htaccess that has to ignore .svn directories, and the corresponding
> flag to rsync.

I've come up with the working copy implementation because a "svn export" is 
too much expensive for a post-commit script. Actually, the "svn export" 
approach is what we have now... It has some problems:

1) It's not "atomic", because it requires a "rm -rf" (which causes some 
trouble as showed by the last website deletion) before moving the new content 
to TARGETDIR, so the website is partially unavailable in this short period.

2) It's very expensive, meaning that the script takes too much time recreating 
the entire website tree even if we didn't change anything on that day.

3) Being expensive, it needs to run as a cron job, meaning that the website 
will never be up-to-date with what we have on the repository (e.g. currently 
if we add a news item we have to run run-uptate-website.sh manually otherwise 
it will become live only hours later).

There may be other issues, but the three above are the most important.

Now, thinking better about using a working copy on the live site, It's 
possible that it's not the best solution ;). For example, if someone 
accidentally changes something manually inside TARGETDIR, that working copy 
would become unclean, and even cause merge conflicts after some revisions. 
(BTW, anyone knows a subversion command similar to "cvs update -C"?)

So I suggest another approach (actually an extension of my current) that fixes 
(1)+(2)+(3) and still provides a clean website tree on TARGETDIR:

Keep the working copy on a separate dir (e.g. /var/website_repos, I accept 
better path suggestions) and then, after running "update-website.mk" (run by 
a post-commit hook) on it, run rsync locally with the appropriate parameters 
to avoid rsyncing unecessary files (something similar to update-website.mk's 
"run-rsync" rule). It's very simple, believe me :)

- Keep a clean live site;
- Avoid changing all mirror's rsync scripts;

What do you guys think?
Anderson Lizardo
lizardo at linuxfromscratch.org

More information about the website mailing list