In the Dart hackathon I got few questions about applications on the server. The best way was to try and give the hackers a code sample… It’s by definition a very simple code but I’m sure that you can take it to the next level without any problem.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#import('dart:io'); | |
#import('dart:uri'); | |
#import('dart:json'); | |
// Dart Hackathon TLV 2012 | |
// | |
// A simple example to fetch RSS/JSON feed and parse it on the server side | |
// This is a good start for a crawler that fetch info and parse it. | |
// | |
// Author: Ido Green | greenido.wordpress.com | |
// Date: 28/4/2012 | |
// | |
class Crawler { | |
String _urlToFetch = "http://feeds.feedburner.com/html5rocks"; | |
String _dataFileName = "webPageData.json"; | |
HttpClient _client; | |
var rssItems; | |
//Ctor. | |
Crawler() { | |
_client = new HttpClient(); | |
} | |
// Fetch the page and save the data locally | |
// in a file so we could process it later | |
fetchWebPage() { | |
// Get all the updates of h5r | |
Uri pipeUrl = new Uri.fromString(_urlToFetch); | |
// open a GET connection to fetch this data | |
var conn = _client.getUrl(pipeUrl); | |
conn.onRequest = (HttpClientRequest request) { | |
request.outputStream.close(); | |
}; | |
conn.onResponse = (HttpClientResponse response) { | |
print("status code:" + response.statusCode); | |
var output = new File(_dataFileName).openOutputStream(); | |
response.inputStream.pipe(output); | |
// In case you want to print the data to your console: | |
// response.inputStream.pipe(stdout); | |
}; | |
} | |
// Read a file and return its content. | |
readFile() { | |
File file = new File(_dataFileName); | |
if (!file.existsSync()) { | |
print ("Err: Could not find: " + _dataFileName); | |
return; | |
} | |
InputStream file_stream = file.openInputStream(); | |
StringInputStream lines = new StringInputStream(file_stream); | |
String data = ""; | |
lines.onLine = () { | |
String line; | |
while ((line = lines.readLine()) != null) { | |
//print ("== "+line); | |
data += line; | |
} | |
}; | |
lines.onClosed = () { | |
print ("Got to the end of: "+_dataFileName); | |
print ("This is our file content:\n" + data); | |
parsePage(data); | |
}; | |
} | |
// | |
// Basic (real basic) parsing | |
// | |
parsePage(data) { | |
// cut the intersting part of the feed | |
int start = data.indexOf("<title>"); | |
int end = data.lastIndexOf("</channel>"); | |
var feed = data.substring(start, end); | |
// put the items in an array | |
rssItems = feed.split("<title>"); | |
for (var item in rssItems) { | |
print("\n** Item: " +item); | |
} | |
} | |
} // End of class | |
// | |
// Start the party | |
// | |
void main() { | |
Crawler crawler = new Crawler(); | |
crawler.fetchWebPage(); | |
crawler.readFile(); | |
} |
this example could be consider version 0.01 of a real crawler. You do need to add to the real first version features like:
- Discovery – Be able to get links from the current page and jump into them. This is much harder then it sounds, as you want to make sure it won’t continue forever.
- Parsing – parse the information on the page. Try to gain the meta data and add it to the ‘real’ content (which is based on your goals from the crawler).
- Analyze – Meaning, normalize the information of the page and put it in a storage (DB, file, a cloud solution etc’).
- Logging &Monitoring – As this server side process will run while you are sleeping… It’s best to have some good ‘watch-dog’ on it. The start will be with some simple logging and analyzing of the logs. The second step will be to use a tool to monitor the action.
Key lessons:
- There is a real need to libraries that will make the parsing better. xPath, DOM to Map (or Array) etc’.
- The debugging in the editor could improved… and as a first step you might want to use a logging library that will give you a lot of information for each step.
- The editor making the development phase very nice with warnings on (almost) every issue that you might do. I found it very productive to be back in the good hands of ‘IDE’.
- I guess that in the near future we will see some good examples that use Dart VM on the server – It’s going to be interesting to profile their performance and see where do we stand vis a vis other modern languages like: Scala.