Scraping HTML with XPath

Scraping HTML with XPath is a booklet that guides to extracting information from HTML pages using XPath.

About this book

I came with the idea of this booklet thank to Peter that kindly answered a question on the Pharo mailing-list. To help, Peter showed to a Pharoer how to scrap the web site mentioned in Chapter 2 using XPath. In addition, some years ago I was maintaining Soup a scraping framework because I want to write an application to manage my magic cards. Since then I always wanted to try XPath.

In addition I wanted to offer this booklet to Peter. Why because I asked Peter if he would like to write something and he told that he was at a great age where he would not take any commitment. I realised that I would like to get as old as him and be able to hack like a mad in Pharo with new technology. So this booklet is a gift to Peter, a great and gentle Pharoer. I would like to thank Monty the developer of the XML package suite for its great implementation and the feedback on this booklet. Stef

An Open Book

This book is an open book:

The full book is available as a free printable PDF download (milestone version).

Latest version on our book farm is available.

The content of this book is released under a Creative Commons Attribution-NoCommercial-ShareAlike license.

Authors

Stéphane Ducasse
Peter Kenny

Please contact me if you noticed I wrote something wrong or not fully precise.

You can support Stéphane Ducasse. Thanks in advance.