Draft of archive_news.pl script

Anderson Lizardo andersonlizardo at yahoo.com.br
Mon Aug 25 19:42:16 PDT 2003

Jeroen Coumans wrote:
> > - The links on the website (including that on the news contents)
> > are relative URIs. Because of this, some links will become broken
> > on the news archives. As a workaround, I inserted a <base
> > href="../../" /> tag on the archive-top.html file. But now,
> > fragment links to the current file, like "#header", will not work.
> Can the relative URL's be converted to static URL's in the script?
> This should make all links work again.

The conversion is possible, but it doesn't fully resolve the problem. 
See why:
Suppose we have one file named http://www.lfs.org/lfs/news.html, with 
these links:
<a href="#header">Link1</a>
<a href="../blfs/news.html">Link2</a>

By default, the base url is the current directory/file. So, the links 
above are converted to:
<a href="http://www.lfs.org/lfs/news.html#header">Link1</a>
<a href="http://www.lfs.org/lfs/../blfs/news.html">Link2</a>

The links above work fine until we mv news.html to 
<a href="http://www.lfs.org/news/lfs/YYYY/MM.html#header">Link1</a>
<a href="http://www.lfs.org/news/lfs/YYYY/../blfs/news.html">Link2</a>

Link1 still works, because the current file is actually the same from 
before (just moved to another place). But Link2 don't. And if we change 
the base url to http://www.lfs.org/lfs/news.html (the original one)? 
The links are expanded to this:
<a href="http://www.lfs.org/lfs/news.html#header">Link1</a>
<a href="http://www.lfs.org/lfs/../blfs/news.html">Link2</a>

All links work now, but the navigational link "#head" will now be a 
anchor to another file, not the current one.

Finally, the conversion is possible, but it will have at least one 
side-effect: broken "navigational" links (like #rootcontent, 
#generalnav, etc.). My suggestion is to remove these links from the 
archive-{top,bottom}.html templates.

> > - This is a crude hack, and I recommend we use the method described
> > in
> > http://archives.linuxfromscratch.org/mail-archives/website/2003-Aug
> >ust/000469.html.
> >
> > This script can be adapted to convert the current news to this
> > format, and I could write a script to parse it.
> Yes please, the above method is a lot better and also automates news
> page creation.  BTW I assume that method allows for multiple <p>'s?

Yep, and any other XHTML tag you want.

> It wasn't really clear from your mail. And the above method will also
> automate the news page generation, right? (save for the scripts to
> generate them, but I can write those myself)

Yes. The method involves the following steps:
1) Someone should write a news item and insert it on top of news.txt
2) The Perl script, run daily from fcron, will parse each news.txt, 
archive the news, and ouput the 5 most recent ones to temporary files 
(eg: {lfs,blfs}/news.tmp).
3) Another script, made by you, will cat the correct files (including 
Changelog, general news, etc.) and create the respective news.html.

One important thing is that must be specific templates (that 
*-{top,bottom}.html files) for current news and for archived news, 
because of the differences on the base URL and on the specific 
left-side menu.

Another thing: on news sites, like Slashdot, Linux Today, etc., It's 
common to see links to previous news. When we have to make a link to 
old news, we should use a absolute URL like 
http://www.lfs.org/news/lfs/2003/08.html#newsid. When the archives 
become on-line, you should convert some ocurrences of links to old news 
to this format.

> I'll take care of creating the {project}/news.txt and the respective
> news/archive templates so you can focus on the script.


> BTW we can mail lfs-chat to see if there are other perl-coders
> willing to work on this if your time is that limited...?

No problem :-). This way, the TODO list will become empty more quickly.

OT: I've found a time before sleep to answer e-mails ;-)
Anderson Lizardo

Desafio AntiZona: participe do jogo de perguntas e respostas que vai
dar um Renault Clio, computadores, câmeras digitais, videogames e muito
mais! www.cade.com.br/antizona

More information about the website mailing list