[freebsdzine.org] April 2001 : Characterizing the FreeBSD File Tree

A real patriot is the fellow who gets a parking ticket and rejoices that the system works.

[ Home · Latest BSD News · Site Statistics · Wanted Articles · Request an Article · Submit an Article ]

Search freebsdzine.org

FreeBSD 'zine Polls

How did you find out about this site?

Sections

· Wanted articles
· Request an article
· Contribute
· Mailing lists
· About the site
· The staff
· Copyright info
· Privacy policy
· Change log
· Contact us

Resources

· The FreeBSD Project
· The FreeBSD Diary
· BSD Today
· Daemon News
· Daily Daemon News
· Slashdot BSD
· FreshPorts
· The FreeBSD Mall
· BSDVault
· The FreeBSD Browser
· GreasyDaemon.com
· iso-nix.com

FreeBSD Books

· Complete FreeBSD
· FreeBSD Handbook
· FreeBSD Corporate

Networker's Guide

Characterizing the FreeBSD File Tree
Rich Morin <[email protected]>

Last month's article (The FreeBSD Browser) said quite a bit about the purposes of the browser, but very little about how it actually works. This month, I would like to cover some high-level design issues, while postponing showing off my (rather hacky :-) implementation.

The hardest job that the browser performs is finding a "closely related" set of files and manual pages. It does this by applying some fairly naive graph traversal logic to some (pre-defined and dynamically-generated) linkage information. Let's look at the linkage information first, then move on to the graph traversal logic.

Linkage Information

The demo obtains its linkage information from a variety of sources, including automated analysis of system files, hand-edited XML files, and rule-based links. The automated analysis is performed by a relatively small amount of code, but it generates hundreds of thousands of links. Hand editing adds a few thousand more links, some annotations, etc. The CGI script weaves this material together, adding some (rule-based) links of its own.

Automated analysis is great for some things. Grabbing file system links (hard and symbolic) is a piece of cake. Parsing man pages for FILES and SEE ALSO references isn't much harder. It's even possible, with a bit of cheating, to pull "contrib" directory references out of random makefiles. In short, automated analysis provides a lot of leverage.

When automated analysis starts to break down, hand-edited annotations can be used to fill in the gaps. If the man pages don't mention particular files or man pages, snippets of hand-coded XML can make the missing connections. Unfortunately, although any given bit of XML is easy to generate, producing hundreds or thousands of snippets can be a mind-boggling task. So, I have tried to keep the number of hand-edited files to a minimum; the current number is just short of a thousand...

One trick I use is a simple-minded form of inheritance. The tree of hand-edited files contains sub-trees for all/all (all OS variants and versions), fb/all (FreeBSD, all versions), fb/42 (FreeBSD, version 4.2), etc. The database loading script finds the most specific match it can for each node in the file tree. A simple "include" mechanism can be used to import more abstract declarations into, say, an fb/42 annotation file.

If a relationship can be encoded (e.g., by a bit of run-time Perl code), we can avoid storing large numbers of explicit links. This saves storage and batch-mode processing time, but it may have an offsetting cost at run time. Deciding which things to encode in the database and which things to encode as rules is a tricky proposition; the current balance is workable, but I wouldn't claim that it is optimal...

I have only implemented a few categories of rules to date, but I expect to add more over time. If you think of any rules that might be useful, please let me know! Here, in any case, is a summary of the browser's current rule base:

Manual Pages

Files in certain directories (e.g., /bin) commonly have manual pages in well-known places (e.g., section 1). By guessing (and then confirming) the existence of related files, the CGI script can map man files to manual pages, and vice versa.
Source and Build Directories

FreeBSD's build tree has a (mostly :-) regular mapping to the resulting system. The exceptions are few enough that some table-driven code suffices to map many items to their source and build directories.
Symbolic Links

Although automated analysis is used to find symbolic links, a rule is used to assess the interesting implications of a given link. Thus, if a symlink is (or points to) part of a path name, the result is another path name for the item in question.

Graph Traversal

The CGI script performs a "constrained, breadth-first search" of the links that are connected to the initial item. The initial item is used to seed some Perl hashes, which are then used to motivate a single-level search. As each new item is "discovered", assorted information about the associated link is added to the hashes. This information is used to constrain the search, explain the logic to humans (via "Linkage Trail" pages), etc.

For example, each link type has an associated a "weight" value. The initial item has a weight of 1.0, but some links have weights of less than this. As links are added to the linkage trail, the link weights are multiplied together. If the product falls below a threshold, no further searching is done.

An even simpler mechanism, based on the number of links traversed, keeps the script from chasing its tail around link cycles, etc. The perceptive reader will note that this is borrowed from the "time to live" technique that the Internet uses to keep packets from bouncing around the network forever.

Some items are worth reporting, but do not qualify for further investigation. If the "follow" flag on a link is false, no further searching is done in that direction. A man page's FILE or SEE ALSO references to a preceding item fall into this category.

In summary, the browser searches a local "neighborhood" of the original item, stopping when all of the trails have "died out". By adding links, tuning the weights and tweaking the code, I have been able to increase the number of "interesting" results, while keeping the "noise" (i.e., false positives) under control.

Future Directions

I have recently started to add Mac OS X coverage to the browser and may also add NetBSD to the mix. Covering a "family" of similar operating systems allows me to test (and extend) the flexibility of the design, while staying in a relatively constrained problem domain. Once this family is under control, I can consider adding other operating systems to the coverage.

With few exceptions, the automated analysis stops at the file level. By adding parsing rules for assorted file types (e.g., control files, man pages, source files), I could allow the browser to explore more types of relationships. At the same time, adding some file-level browsing capability would allow users to follow this information into the files themselves.

I have also been thinking about how to analyze, characterize, and represent "subsystems" such as the mail and print spoolers. Because this problem brings in the possibility of handling the entire Ports Collection, some caution is warranted in approaching it!

The browser's current (web-based) design limits it to covering "generic" operating system information. By creating support for "local clients" (e.g., via XML), I could allow a user to examine his or her own system, using the browser's information resources. This would be a significant step towards the implementation of a true Meta system.

Return to the April 2001 Issue

Issues

2001
· April
· March
· February
· January

2000
· December
· August
· July
· June
· May
· April
· March
· February

1999
· January

Other issues from 1999 are available in the attic for now.

Other News

· Slashdot
· FreeBSD Diary
· BSD Today
· FreshPorts
· Daemon News
· OS Online
· Rootprompt
· Maximum BSD

Miscellaneous

· Jim's site

IRC

#freebsdzine
If you'd like to hang out with us and talk about the site, join us in #freebsdzine on Undernet.

Backend

You can add a list of our latest issue's articles to your site by using our RDF/RSS file. You can also add it to your My Netscape page, or add our slashbox once you log in over at Slashdot.

[ Home · Latest BSD News · Site Statistics · Wanted Articles · Request an Article · Submit an Article ]