The Problem
RSS (Really Simple Syndication) is an XML application for distributing web content that changes frequently. Many news-related sites, weblogs and other online publishers syndicate their content as an RSS Feed to whoever wants it. In this lab, you will write code that extracts information from an RSS (version 2.0) document loaded into an XMLTree object.
RSS 2.0 documents have the following format:
<rss version="2.0">
<channel>
<title>Title goes here</title>
<link>Link goes here</link>
<description>Description goes here</description>
<item>
<title>Optional title goes here</title>
<description>Optional description goes here</description>
<link>Optional link goes here</link>
<pubDate>Optional publication date goes here</pubDate>
<source url="the source URL">Optional source goes here</source>
...
</item>
...
</channel>
</rss>
Note the following properties of RSS 2.0 XML documents:
- The children of the <channel> tag and of the <item> tag can occur in any order; do not assume they will appear in the order above. Furthermore there can be other children of other types not listed above.
- <title>, <link>, and <description> are required children of the <channel> tag, i.e., you should assume they are present. However, <title> and <description> may be blank, i.e., they may not have any text child.
- All the children of <item> tag are optional, i.e., do not assume they are present; but, either <title> or <description> must be present. However, the <title> and/or <description> tags, even if present, may be blank, i.e., they may not have any text child.
- If a <source> tag appears as a child of an <item> tag, it must have a url attribute.
Setup
Follow these steps to set up a project for this lab.
- Create a new Eclipse project by copying ProjectTemplate. Name the new project RSSProcessing.
- Open the src folder of this project and then open (default package). As a starting point you can use any of the Java files. Rename it RSSProcessing and delete the other files from the project.
- Follow the link to RSSProcessing.java, select all the code on that page (click and hold the left mouse button at the start of the program and drag the mouse to the end of the program) and copy it to the clipboard (right-click the mouse on the selection and choose Copy from the contextual pop-up menu), then come back to this page and continue with these instructions.
- Finally in Eclipse, open the RSSProcessing.java file; select all the code in the editor, right-click on it and select Paste from the contextual pop-up menu to replace the existing code with the code you copied in the previous step. Save your file.
Method
- Implement the following static method that, given an
XMLTree and a tag name (a String), searches the children of the
XMLTree for the given tag and returns the index of the first
occurrence of the tag or -1 if the tag does not exist.
/** * Finds the first occurrence of the given tag among the children of the * given {@code XMLTree} and return its index; returns -1 if not found. * * @param xml * the {@code XMLTree} to search * @param tag * the tag to look for * @return the index of the first child of the {@code XMLTree} matching the * given tag or -1 if not found * @requires [the label of the root of xml is a tag] * @ensures <pre> * getChildElement = * [the index of the first child of the {@code XMLTree} matching the * given tag or -1 if not found] * </pre> */ private static int getChildElement(XMLTree xml, String tag) {...}
- Review the main method skeleton and modify it to
output the title, description, and link of
the RSS channel. Each element in the output should be preceded by
a descriptive label, e.g.,
Title: Yahoo! News - Latest News & Headlines Description: The latest news and headlines from Yahoo! News. Link: http://news.yahoo.com/
Run the program and test your implementation. As input you can use any URL of a valid RSS 2.0 feed, e.g., https://news.yahoo.com/rss/. - Once you are confident that your implementations above are
correct, implement the following static method that, given an
XMLTree whose root is an <item> tag and an output stream,
outputs the title (or the description, if the title is not
available) and the link, if available.
Here is an example of what the output might look like:/** * Processes one news item and outputs the title, or the description if the * title is not present, and the link (if available) with appropriate * labels. * * @param item * the news item * @param out * the output stream * @updates out.content * @requires [the label of the root of item is an <item> tag] and out.is_open * @ensures out.content = #out.content * [the title (or description) and link] */ private static void processItem(XMLTree item, SimpleWriter out) {...}
Title: Tropical Storm Leslie churns northward in Atlantic Link: http://news.yahoo.com/storm-churns-northward-winds-buffeting-bermuda-144218080.html
- Back in the main method, add code so that it prints all items in the RSS channel by repeatedly calling processItem. Then run and test your code to make sure it works as intended.
Additional Activities
- Modify processItem (including updating the comments) so that, in addition to title (or description) and link, it also outputs publication date (tag pubDate) and source (tag source) with the source URL (attribute url of source tag). If any of these elements are not present, output <element> not present (where <element> is replaced by the name of the missing tag).