Chrome, webdev

Dart Crawler Example

In the Dart hackathon I got few questions about applications on the server. The best way was to try and give the hackers a code sample… It’s by definition a very simple code but I’m sure that you can take it to the next level without any problem.

this example could be consider version 0.01 of a real crawler. You do need to add to the real first version features like:

  • Discovery – Be able to get links from the current page and jump into them. This is much harder then it sounds, as you want to make sure it won’t continue forever.
  • Parsing – parse the information on the page. Try to gain the meta data and add it to the ‘real’ content (which is based on your goals from the crawler).
  • Analyze – Meaning, normalize the information of the page and put it in a storage (DB, file, a cloud solution etc’).
  • Logging &Monitoring – As this server side process will run while you are sleeping… It’s best to have some good ‘watch-dog’ on it. The start will be with some simple logging and analyzing of the logs. The second step will be to use a tool to monitor the action.

Key lessons:

  • There is a real need to libraries that will make the parsing better. xPath, DOM to Map (or Array) etc’.
  • The debugging in the editor could improved… and as a first step you might want to use a logging library that will give you a lot of information for each step.
  • The editor making the development phase very nice with warnings on (almost) every issue that you might do. I found it very productive to be back in the good hands of ‘IDE’.
  • I guess that in the near future we will see some good examples that use Dart VM on the server – It’s going to be interesting to profile their performance and see where do we stand vis a vis other modern languages like: Scala.

One thought on “Dart Crawler Example

  1. Pingback: Weekly Dart community update – week ending 05 May 2012 » DartWatch - Watching Google Dart

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s