-*- outline -*- * Main outline ** What is Content Management (CM)? Used to describe a number of things: - a set of design principles; a conceptual toolkit - off-the-shelf products for managing interlinked data and media - a way of relating to the customer ** When is it appropriate? Always. The scale of the CMS is the important variable. Even simple websites can be seen as CMSs - sometimes one or more pieces is a degenerate case, but nonetheless the system as a whole is conceptually a CMS. Rough rule of thumb: a more complex, automated CMS is needed when you want to present the same information in more than one way; or, when the work involved in hand-rolling the presentation of the information starts to get bigger than the work involved in building a generic information-presenting program. Other reasons for choosing a more complex design include: - to provide automated site management, without the need for manual intervention to arrange for the correct presentation of new content. - to provide separation of roles - content editor vs. presentation specialist. ** The structure of a CMS - Model, View, Controller One of the lessons in software design over the past forty years has been that it is vital that the data itself, and the internal management and book-keeping of the model, is kept totally separate from aspects of the code that deal with presentation and manipulation of the model. The Smalltalk team, during the 1970s as they invented the modern graphical user interface, invented terminology to describe the common patterns they saw in advanced software for a personal computer: *** The Model The model is the abstract representation of the domain of concern. - Wiki pages - Blog entries, posting categories, trackbacks - Problem reports, users, product categories The model (in an ideal world) does not have any presentation-related interfaces to it. All of the interfaces to the model are programmatic, and solely domain-related. Of course, in the real world, sometimes it's unclear where the correct interface between model and view (described next) should lie. This is why refactoring is so important. While the model is an abstract representation of the domain, it must of course have a concrete implementation itself. The implementation should also not affect the interface to the model - again, this is an ideal that can only be asymptotically approached. *** The View The view is the concrete representation of the domain of concern. - HTML web pages (and forms) - RSS feeds - client-side GUIs Often the view and the controller (described next) are tightly integrated. In fact, it can be very difficult to see controllers as independent objects - which is why many modern systems are tending to combine the view and the controller roles into single objects. This sometimes has benefits (eg. Morphic), and sometimes disadvantages (eg. Swing). *** The Controller The controller is the aspect of the system that responds to user actions, updating the model and hence the view. The controller integrates the entire system into a running application. HTML forms make use of the controller functionality implicit in the browser. When the web browser accepts user input into form fields, marshalls the data, and submits the data to the web server, that's half of the controller story - the other half is how the web server processes the submitted data, updates the model, and generates a fresh view for transmission back to the browser. *** Recursion of the layers Each layer seems to have aspects of the others within it. Determining where the best place is to make cuts between the layers in your application is a skill that needs practice. Experts get it wrong all the time (eg. Swing). *** Web Frameworks In modern web frameworks, we are starting to see a clearer separation of the three MVC layers. Often a database is used to make the data backing the model persistent; the model itself is usually a (fairly thin) layer atop the database; the view is often an XML-transformation-based presentation pipeline; and the controller is partly implicit in the browser and partly implemented via another thin layer atop the model. ** The model is the key idea in a CMS The most important thing for your application is the data, and the way the data should be interpreted. This cannot be stressed enough. Presentation and data representation pale in comparison to the importance of getting the interface to the model correct. The interface should be focussed around maintaining an appropriate level of abstraction of the model. The database backing the model should be simultaneously readily-accessible, and presentation-neutral. There are two slippery slopes in model design: you can get too close to the representation, the way data is stored, or you can get too close to the presentation, the way data is to be displayed or edited. It can be *very* difficult to keep away from the two attractors. Refactoring is vital - if the interface to the model isn't quite right, exposing too much representation, or too much presentation, it must be corrected for the health of the system; and the sooner this is done, the better. *** Digression: reflection - where is the meaning? Where is the *meaning* of a piece of information? What makes a model meaningful? The meaning of a set of information is partly in the information itself, and partly in the interpretation of the information. (Example: bitwise, hex, ASCII encoding of a sentence, words, parse-tree, symbol) ** Interpreting the model When you present the information in the database to the user, you're choosing an interpretation. Make sure the level of abstraction fits the intended communication - even if the representation of the information is more low-level. Use the tools your language gives you to work at the correct level of abstraction. There's no point in grovelling over a set of tables when you could give a high-level query. High-level queries free you from the representation, so you can change it later if need be. (Aside: When using SQL databases, views and stored procedures can help isolate you from changes in representation.) ** Encodings Your job in planning your CMS is to choose an appropriate level of abstraction for your domain, and to design a little language for talking about and manipulating objects in the domain. Your server-side scripts will then use this little language. Languages can scale from a simple set of subroutines through to a full-blown programming language with its own syntax, semantics and IDE. If there's not a great mismatch between your domain language and the language implemented by an off-the-shelf CMS, you can wedge your application in there. This is called encoding - you encode your domain language in someone else's domain language. Usually CMSs don't have good support for automatic encoding, so you have to do it by hand. Beyond a certain level of mismatch it becomes cheaper to write your own tool: either to encode into another CMS' domain, or to manage your domain directly. ** Two ways of working with Wiki Wiki markup, in any implementation, can be seen in two different ways: either as a set of instructions for spitting out HTML tags, or as a description of an HTML document. It's both data, and *simultaneously* program code for a simple HTML-oriented DSL. ** Wikis and Creeping Featuritis The little languages representing a domain of concern can grow, sometimes quite quickly and unexpectedly. As an example, the original Wiki by Ward Cunningham only supported a few different kinds of markup, and was not extensible. As more people took up the idea and started building wikis for themselves, Wiki markup grew and grew into essentially a small ad-hoc domain-specific programming language, different for each Wiki implementation. Perhaps it would have been better to start with a full programming language from the start, augmented with primitives for the domain of concern? Something like Skribe might make a good choice for a Wiki markup language. (Examples: Pyle, and its growth from simple wiki to plugin-based and scriptable; TiddlyWiki, and its nascent reflective capability with plugin tiddlers) ** Features of a CMS *** Access control and permissions If the organisation managing the database is large, it is often a requirement that the CMS support separate user logins, each having their own permission set. Certain areas of the database would be restricted on a per-account basis. Techniques for managing permission sets include capabilities and Access Control Lists (ACLs). *** Auditing, change tracking and RSS One feature commonly requested by larger organisations is auditing: the ability to track each change made to the system. The system keeps a note of who made the change, a summary of the change itself, and a timestamp. Audit systems can be extended into the realm of version control and change tracking: once you have a record of each change, so long as the record is detailed enough, you can selectively undo changes, effectively allowing you to time-travel through the different versions of the database. Another use for audit logs is to publish RSS feeds broadcasting changes in the site. ** Off-the-shelf, or DIY? Off-the-shelf: careful about mismatches, where the encoding is too complex. You get a lot of important things already implemented! (Authentication and authorisation, sometimes change tracking, input validation, DB consistency, many-eyes-make-shallow-bugs, user community etc.) DIY: careful about maintainability. Detailed (post-hoc) documentation of the model and representation are essential. Often need to reimplement the wheel. Main benefit is control over the encoding of the data, over the model's performance characteristics. Sometimes a simple CMS can have its model split out, and the view can be replaced with an off-the-shelf user interface toolkit or framework. ** Some examples TiddlyWiki WordPress Zope Pyle (?) ** Aspects of implementing your model *** Data representation and persistence Databases - denormalised, normalised; normalizing information. Relational DBs are a badly-engineered hack on an elegant piece of maths (http://web.onetel.com/~hughdarwen/TheThirdManifesto/HAVING-A-Blunderful-Time.html and Darwen&Date in general), plus industrial-strength concurrency control (ACID properties). A denormalized database is OK, but when it needs changing, refactoring is really important. You can normalise a table and replace it with a view, modulo insertion. Adding a column to a denormalised table means you have to alter each query that references that table; normalised databases don't have that problem. Basic SQL. SQL replication and replication in general. Distribution becomes easy once you have a suitable choice of abstraction (neither representation- nor presentation-oriented). RDF - frame representation. Graphs. The neat trick is the standardised metadata: you can describe edge categories. An extension to RDF lets you annotate individual edges, too. Hibernate - use example of perfmap. Compare to the hand-rolled model in simpleblog. *** Distribution Keeping the level of interaction between your model and your view correct means that you can run the model headless, and separate the presentation layer completely. This makes for flexibility in terms of topology - a single model server can serve many different view/controller clients distributed around the network. For instance, intranets could all access the same model and present the data differently. Traditional GUI client applications can be used alongside HTML frontends. ** Aspects of implementing the view(s) and controller(s) In general, programs for editing content can be kept completely separate from the programs for viewing content. In the case of simple encodings into SQL or the file system, normal SQL- or file-system-management tools can be used as the editor. When the domain becomes complex, specialised editors might have to be built, but they can still be kept completely separate from the presentations since the data is accessed at the correct level of abstraction for the domain. (Examples of different projections: HTML; RSS; RDF; GUI client; Emacs interface via FTP server (Pyle)) ** Leaky Abstractions ** Keeping Sites Alive - tips ideas and pitfalls ** How to make sure a site grows as its parent organisation grows ** Building a simple CMS in ASP ** MUDs/MOOs as a CMS * Things I want to cover XML: tips for XML - use namespaces! Beware of old parsers (and xpath) though. XML represents trees (show infoset) where RDF represents graphs. Alternatives include JSON. Mention XmlHttpRequest? Reasons Not To Use Zope. Unless that's your target dsl. Some LShift examples: - MNM perfmap (non-CMS) - Expro (RDF-like) - NMK - bathomebase (simple image-based CMS?) - SSM Coal (perl CMS, very simple and rigid) * Exercise blosxom-like blog - author - category - post More than one table, to illustrate joins. Compare the JSON implementation to FSDB, like blosxom/gyre/fuschia use - - FSDB is an encoding of the datastructures of the blog into the structures of the FS. Hierarchy of dirs -> hierarchy of categories etc. - Use normal FS commands (cp, mv, rm, vi) for management of the database which obviates the need for a management/admin tool Compare to SQL database - it's still an encoding, to tables this time - hierarchy not as natural with SQL (although the proper relational calculus does better with this) - Use SQL table-management commands to manage content, can still do without a management/admin tool * Excerpts from Emails to Rachel from me Mikeb and I sat down and had a chat about the workshops the other day, and came up with a few ideas and a few questions. We don't really have any idea what level the students will be up to, but we thought perhaps to compare and contrast a few different CMSs, looking at the separation of concerns in design and implementation. We thought it would tie in with the theme "Keeping Websites Alive" by emphasising the structured data kept at the back-end of each CMS, and the separation of the data itself from the layers of processing and rendering for display. We might also briefly mention RSS/RDF. Obviously, given our technical focus, we'll try to stick to the *technical* side of keeping sites alive rather than the *marketing* side! Mikeb mentioned there were coursenotes for the year he taught the workshops: are there any for this year? We looked at the wiki but couldn't find anything obviously intended for 2005. * Excerpts from Emails from Rachel ** Hardware in labs iMacs. Tried to get PCs in to no avail, sadly. ** Software on Macs Dreamweaver. ** Server capabilities Each student has their own virtual server which they can use for ASP projects. For some reason I couldn't get the Access drivers to work on them, so we have been using text-based databases, which have worked well for small projects. ** Student capabilities I have taught them XHTML and CSS. They know enough ASP 3.0 to be getting on with (I know it's an old technology, but it's the only one cheap enough to give each student their own managed server, and it's the principles of server-side stuff I want them to get their heads round, not one particular technology.) They know the prinicples involved in flat-file databases and SQL statements etc, but I haven't taught them much about the relational side of things. They have also had about 16 hours teaching in Actionscript 1.0 so Javascript will not be totally foreign, hopefully, but we're not talking tech-heads here. ** Excerpts from the original email from Rachel to Andy Following our conversation, here's a bit more detail about what I and the students would like you to cover / explore. We have a budget for 15 hours worth of teaching, so there's a reasonable amount of room there, although like with most of these subjects I teach, I know you could probably to an MSc in it and still leave wanting more. I have now taught the students some ASP, not because it's the best technology (not by a very long way) but the cost of hosting it has allowed us to provide each student with their own sandbox server, something which other courses would be unable to do with a server-side technology. So they understand the principles now of dynamic sites, but I think they need to be taught and shown: - Intro to the principles - what is CM and is it appropriate for a given problem? - Off-the-shelf or DIY? - Keeping sites alive - tips, ideas and pitfalls - How to make sure a site grows as its parent organisation grows - The build - a simple CMS in ASP The reason we're doing this is because the students said to me 'we want to know about how to make it easy for clients to update their sites' and I said, ahah, you mean CMS. So here we are.