-*- outline -*-

* Main outline

** What is Content Management (CM)?

Used to describe a number of things:

 - a set of design principles; a conceptual toolkit
 - off-the-shelf products for managing interlinked data and media
 - a way of relating to the customer

** When is it appropriate?

Always. The scale of the CMS is the important variable. Even simple
websites can be seen as CMSs - sometimes one or more pieces is a
degenerate case, but nonetheless the system as a whole is conceptually
a CMS.

Rough rule of thumb: a more complex, automated CMS is needed when you
want to present the same information in more than one way; or, when
the work involved in hand-rolling the presentation of the information
starts to get bigger than the work involved in building a generic
information-presenting program.

Other reasons for choosing a more complex design include:

  - to provide automated site management, without the need for manual
    intervention to arrange for the correct presentation of new
    content.

  - to provide separation of roles - content editor vs. presentation
    specialist.

** The structure of a CMS - Model, View, Controller

One of the lessons in software design over the past forty years has
been that it is vital that the data itself, and the internal
management and book-keeping of the model, is kept totally separate
from aspects of the code that deal with presentation and manipulation
of the model.

The Smalltalk team, during the 1970s as they invented the modern
graphical user interface, invented terminology to describe the common
patterns they saw in advanced software for a personal computer:

*** The Model

The model is the abstract representation of the domain of concern.

 - Wiki pages
 - Blog entries, posting categories, trackbacks
 - Problem reports, users, product categories

The model (in an ideal world) does not have any presentation-related
interfaces to it. All of the interfaces to the model are programmatic,
and solely domain-related.

Of course, in the real world, sometimes it's unclear where the correct
interface between model and view (described next) should lie. This is
why refactoring is so important.

While the model is an abstract representation of the domain, it must
of course have a concrete implementation itself. The implementation
should also not affect the interface to the model - again, this is an
ideal that can only be asymptotically approached.

*** The View

The view is the concrete representation of the domain of concern.

 - HTML web pages (and forms)
 - RSS feeds
 - client-side GUIs

Often the view and the controller (described next) are tightly
integrated. In fact, it can be very difficult to see controllers as
independent objects - which is why many modern systems are tending to
combine the view and the controller roles into single objects. This
sometimes has benefits (eg. Morphic), and sometimes disadvantages
(eg. Swing).

*** The Controller

The controller is the aspect of the system that responds to user
actions, updating the model and hence the view. The controller
integrates the entire system into a running application.

HTML forms make use of the controller functionality implicit in the
browser. When the web browser accepts user input into form fields,
marshalls the data, and submits the data to the web server, that's
half of the controller story - the other half is how the web server
processes the submitted data, updates the model, and generates a fresh
view for transmission back to the browser.

*** Recursion of the layers

Each layer seems to have aspects of the others within it. Determining
where the best place is to make cuts between the layers in your
application is a skill that needs practice. Experts get it wrong all
the time (eg. Swing).

*** Web Frameworks

In modern web frameworks, we are starting to see a clearer separation
of the three MVC layers. Often a database is used to make the data
backing the model persistent; the model itself is usually a (fairly
thin) layer atop the database; the view is often an
XML-transformation-based presentation pipeline; and the controller is
partly implicit in the browser and partly implemented via another thin
layer atop the model.

** The model is the key idea in a CMS

The most important thing for your application is the data, and the way
the data should be interpreted. This cannot be stressed
enough. Presentation and data representation pale in comparison to the
importance of getting the interface to the model correct. The
interface should be focussed around maintaining an appropriate level
of abstraction of the model. The database backing the model should be
simultaneously readily-accessible, and presentation-neutral.

There are two slippery slopes in model design: you can get too close
to the representation, the way data is stored, or you can get too
close to the presentation, the way data is to be displayed or
edited. It can be *very* difficult to keep away from the two
attractors.

Refactoring is vital - if the interface to the model isn't quite
right, exposing too much representation, or too much presentation, it
must be corrected for the health of the system; and the sooner this is
done, the better.

*** Digression: reflection - where is the meaning?

Where is the *meaning* of a piece of information? What makes a model
meaningful?

The meaning of a set of information is partly in the information
itself, and partly in the interpretation of the
information.

(Example: bitwise, hex, ASCII encoding of a sentence, words, parse-tree, symbol)

** Interpreting the model

When you present the information in the database to the user, you're
choosing an interpretation. Make sure the level of abstraction fits
the intended communication - even if the representation of the
information is more low-level.

Use the tools your language gives you to work at the correct level of
abstraction. There's no point in grovelling over a set of tables when
you could give a high-level query. High-level queries free you from
the representation, so you can change it later if need be.

(Aside: When using SQL databases, views and stored procedures can help
isolate you from changes in representation.)

** Encodings

Your job in planning your CMS is to choose an appropriate level of
abstraction for your domain, and to design a little language for
talking about and manipulating objects in the domain. Your server-side
scripts will then use this little language. Languages can scale from a
simple set of subroutines through to a full-blown programming language
with its own syntax, semantics and IDE.

If there's not a great mismatch between your domain language and the
language implemented by an off-the-shelf CMS, you can wedge your
application in there. This is called encoding - you encode your domain
language in someone else's domain language. Usually CMSs don't have
good support for automatic encoding, so you have to do it by
hand. Beyond a certain level of mismatch it becomes cheaper to write
your own tool: either to encode into another CMS' domain, or to manage
your domain directly.

** Two ways of working with Wiki

Wiki markup, in any implementation, can be seen in two different ways:
either as a set of instructions for spitting out HTML tags, or as a
description of an HTML document. It's both data, and *simultaneously*
program code for a simple HTML-oriented DSL.

** Wikis and Creeping Featuritis

The little languages representing a domain of concern can grow,
sometimes quite quickly and unexpectedly. As an example, the original
Wiki by Ward Cunningham only supported a few different kinds of
markup, and was not extensible. As more people took up the idea and
started building wikis for themselves, Wiki markup grew and grew into
essentially a small ad-hoc domain-specific programming language,
different for each Wiki implementation.

Perhaps it would have been better to start with a full programming
language from the start, augmented with primitives for the domain of
concern? Something like Skribe might make a good choice for a Wiki
markup language.

(Examples: Pyle, and its growth from simple wiki to plugin-based and
scriptable; TiddlyWiki, and its nascent reflective capability with
plugin tiddlers)

** Features of a CMS

*** Access control and permissions

If the organisation managing the database is large, it is often a
requirement that the CMS support separate user logins, each having
their own permission set. Certain areas of the database would be
restricted on a per-account basis.

Techniques for managing permission sets include capabilities and
Access Control Lists (ACLs).

*** Auditing, change tracking and RSS

One feature commonly requested by larger organisations is auditing:
the ability to track each change made to the system. The system keeps
a note of who made the change, a summary of the change itself, and a
timestamp.

Audit systems can be extended into the realm of version control and
change tracking: once you have a record of each change, so long as the
record is detailed enough, you can selectively undo changes,
effectively allowing you to time-travel through the different versions
of the database.

Another use for audit logs is to publish RSS feeds broadcasting
changes in the site.

** Off-the-shelf, or DIY?

Off-the-shelf: careful about mismatches, where the encoding is too
complex. You get a lot of important things already implemented!
(Authentication and authorisation, sometimes change tracking, input
validation, DB consistency, many-eyes-make-shallow-bugs, user
community etc.)

DIY: careful about maintainability. Detailed (post-hoc) documentation
of the model and representation are essential. Often need to
reimplement the wheel. Main benefit is control over the encoding of
the data, over the model's performance characteristics. Sometimes a
simple CMS can have its model split out, and the view can be replaced
with an off-the-shelf user interface toolkit or framework.

** Some examples

TiddlyWiki

WordPress

Zope

Pyle (?)

** Aspects of implementing your model

*** Data representation and persistence

Databases - denormalised, normalised; normalizing
information. Relational DBs are a badly-engineered hack on an elegant
piece of maths
(http://web.onetel.com/~hughdarwen/TheThirdManifesto/HAVING-A-Blunderful-Time.html
and Darwen&Date in general), plus industrial-strength concurrency
control (ACID properties). A denormalized database is OK, but when it
needs changing, refactoring is really important. You can normalise a
table and replace it with a view, modulo insertion. Adding a column to
a denormalised table means you have to alter each query that
references that table; normalised databases don't have that problem.

Basic SQL. SQL replication and replication in general. Distribution
becomes easy once you have a suitable choice of abstraction (neither
representation- nor presentation-oriented).

RDF - frame representation. Graphs. The neat trick is the standardised
metadata: you can describe edge categories. An extension to RDF lets
you annotate individual edges, too.

Hibernate - use example of perfmap. Compare to the hand-rolled model
in simpleblog.

*** Distribution

Keeping the level of interaction between your model and your view
correct means that you can run the model headless, and separate the
presentation layer completely. This makes for flexibility in terms of
topology - a single model server can serve many different
view/controller clients distributed around the network. For instance,
intranets could all access the same model and present the data
differently. Traditional GUI client applications can be used alongside
HTML frontends.

** Aspects of implementing the view(s) and controller(s)

In general, programs for editing content can be kept completely
separate from the programs for viewing content. In the case of simple
encodings into SQL or the file system, normal SQL- or
file-system-management tools can be used as the editor. When the
domain becomes complex, specialised editors might have to be built,
but they can still be kept completely separate from the presentations
since the data is accessed at the correct level of abstraction for the
domain.

(Examples of different projections: HTML; RSS; RDF; GUI client; Emacs
interface via FTP server (Pyle))

** Leaky Abstractions

** Keeping Sites Alive - tips ideas and pitfalls

** How to make sure a site grows as its parent organisation grows

** Building a simple CMS in ASP

** MUDs/MOOs as a CMS

* Things I want to cover

XML: tips for XML - use namespaces! Beware of old parsers (and xpath)
though. XML represents trees (show infoset) where RDF represents
graphs. Alternatives include JSON. Mention XmlHttpRequest?

Reasons Not To Use Zope. Unless that's your target dsl.

Some LShift examples:

  - MNM perfmap (non-CMS)
  - Expro (RDF-like)
  - NMK
  - bathomebase (simple image-based CMS?)
  - SSM Coal (perl CMS, very simple and rigid)

* Exercise

blosxom-like blog
 - author
 - category
 - post

More than one table, to illustrate joins.

Compare the JSON implementation to FSDB, like blosxom/gyre/fuschia use -
  - FSDB is an encoding of the datastructures of the blog into
    the structures of the FS. Hierarchy of dirs -> hierarchy of categories
    etc.
  - Use normal FS commands (cp, mv, rm, vi) for management of the database
    which obviates the need for a management/admin tool

Compare to SQL database
  - it's still an encoding, to tables this time
  - hierarchy not as natural with SQL (although the proper relational
    calculus does better with this)
  - Use SQL table-management commands to manage content, can still do
    without a management/admin tool

* Excerpts from Emails to Rachel from me

Mikeb and I sat down and had a chat about the workshops the other day,
and came up with a few ideas and a few questions. We don't really have
any idea what level the students will be up to, but we thought perhaps
to compare and contrast a few different CMSs, looking at the
separation of concerns in design and implementation. We thought it
would tie in with the theme "Keeping Websites Alive" by emphasising
the structured data kept at the back-end of each CMS, and the
separation of the data itself from the layers of processing and
rendering for display. We might also briefly mention
RSS/RDF. Obviously, given our technical focus, we'll try to stick to
the *technical* side of keeping sites alive rather than the *marketing* side!

Mikeb mentioned there were coursenotes for the year he taught the
workshops: are there any for this year? We looked at the wiki but
couldn't find anything obviously intended for 2005.

* Excerpts from Emails from Rachel

** Hardware in labs

iMacs. Tried to get PCs in to no avail, sadly.

** Software on Macs

Dreamweaver.

** Server capabilities

Each student has their own virtual server which they can use for ASP
projects.

For some reason I couldn't get the Access drivers to work on them, so
we have been using text-based databases, which have worked well for
small projects.

** Student capabilities

I have taught them XHTML and CSS.

They know enough ASP 3.0 to be getting on with (I know it's an old
technology, but it's the only one cheap enough to give each student
their own managed server, and it's the principles of server-side stuff
I want them to get their heads round, not one particular technology.)

They know the prinicples involved in flat-file databases and SQL
statements etc, but I haven't taught them much about the relational
side of things.

They have also had about 16 hours teaching in Actionscript 1.0 so
Javascript will not be totally foreign, hopefully, but we're not
talking tech-heads here.

** Excerpts from the original email from Rachel to Andy

Following our conversation, here's a bit more detail about what I and
the students would like you to cover / explore. We have a budget for
15 hours worth of teaching, so there's a reasonable amount of room
there, although like with most of these subjects I teach, I know you
could probably to an MSc in it and still leave wanting more.

I have now taught the students some ASP, not because it's the best
technology (not by a very long way) but the cost of hosting it has
allowed us to provide each student with their own sandbox server,
something which other courses would be unable to do with a server-side
technology. So they understand the principles now of dynamic sites,
but I think they need to be taught and shown:

 - Intro to the principles - what is CM and is it appropriate for a given
   problem?

 - Off-the-shelf or DIY?
 - Keeping sites alive - tips, ideas and pitfalls
 - How to make sure a site grows as its parent organisation grows
 - The build - a simple CMS in ASP

The reason we're doing this is because the students said to me 'we
want to know about how to make it easy for clients to update their
sites' and I said, ahah, you mean CMS. So here we are.