If all you need is just one more RSS feed
Not so long ago I talked to Boris about Magnolia's new website and the problems with displaying blogs of developers using existing rss paragraph. While this paragraph allows displaying content of the feed on the fly there are several issues with the paragraph. First is the connectivity, paragraph doesn't keep the content of the feed, just a url to it and has to obtain fresh version on the fly. In my experience this doesn't always work. Whether it is because feed is not available while server producing it is too busy or there are some network fluctuations in play I've seen many clients to intermittently fail to obtain feed data. Normally with average RSS reader this is not an issue as reader has previous version of the feed locally and treats such situation just as if there were not updates at the moment. Unfortunately this is not the true for the rss paragraph.
Another side effect of current RSS paragraph is that it has to retrieve and parse feeds for every client who tries to see the paragraph. This of course generates unnecessary load on the server.
Evening after the discussion it occurred to me that in Magnolia, we already have all infrastructure we need to make manipulating feeds more robust.
The key is a data module with support for scheduled tasks. Defining the right structure it should be easy to have data importer to periodically check and get fresh version of the feed into the repository. Having content repository, it would be stupid to just import the rss xml when we can already convert it into content during this step. Also having content of all feeds in interest in the repository, we can easily create combined feeds in the planet like style.
Few hours later (well, really few days later as I was busy with 3.6/3.6.1 release of Magnolia at the time) the RSSAggregator module have been born. For now all it can do is to let you define feeds aggregates, periodically update their content and let you render them in paragraphs, but hopefully I'll have some time to extend its functionality even more.
It all starts with defining your aggregate:
You can define unlimited number of feeds to put into the aggregate, optionally if you don't like the title of the feed you can define your own (you will see where this is used later). Another thing that you can do here is to define filters if you are interested only in some entries from the feed. The Categories, Author, Title and Description can be used to do the filtering in first version. The text you can put in here is a normal regex acceptable for
String.matches(), so using "AND Category .*" would have same effect as not using filter at all, while for example, using "AND Category UK" on a BBC news rss feed would give you only news from UK as the result. The tricky part in using the filters is to remember that currently filters are applied to all the entries in the aggregate not only to one feed.
Once you have defined your feed, you can start playing around with scheduling refresh intervals for the feed data. This is one of the main differences in comparison to current rss paragraph - rather then trying to obtain fresh feed data on every display of the paragraph, data module is going to check for it periodically no matter if/or how many times the paragraph(s) associated with the feed have been displayed. To do so all you have to do is to select update interval for the RSSAggregator module. Alternatively you can trigger the update manually.
Now we have the feed aggregate configured and being refreshed automatically, we can have a look at what to do with data. Module comes with two paragraphs already, but I'm sure there is still plenty of space for creativity.
The feedListParagraph renders feeds from the aggregate one by one below each other (in the order in which they were defined), using feed title (or custom title if specified earlier) as a heading for each feed. It will also let you to configure number of entries shown for each feed, ordering and how much of the entry description should be displayed.
The combinedFeedParagraph on the other hand, takes entries from each individual feed in the aggregate and combines them together in one giant feed. Again it let you specify the order, amount of entries displayed and how much of the description for each entry will be displayed.
Paragraph can also optionally provide link to the rss of combined feed so instead of subscribing to individual feeds you can have Magnolia to combine them all together and then subscribe to the aggregated feed. This is useful for example if you want to create one feed from individual blogs of all developers working on your project.
There is for sure still place for improvement. Here goes my individual TODO list. Feel free to suggest more useful things the module could provide.
- Allow to specify filters per feed rather then per aggregate
- Have different refresh times for each aggregate (at the moment they are all going to be refreshed at same interval)
- Have a choice of different feed types (RSS, Atom) when generating combined feed
- Store more details about each individual feed/entry (right now Title, Link, ID, Description, Author, Publication date, Categories are stored)
Just few more remaining details:
- Module is using Rome Project to process and generate RSS feeds
- You can get Magnolia from www.magnolia.info
- You can get source of the RSSAggregator module from svn