crawl Archives - Panagiotis Tzamtzis | Παναγιώτης Τζαμτζής

Parsing web sitemaps using JavaScript

Panagiotis 15 Feb 2021 Coding 5714

Recently I came across an interesting project by Sean Thomas Burke called Sitemapper. This is a mini framework, which can be used to parse through sitemap XML files to get all included URLs. Such functionality is necessary when crawling through websites, as the sitemap (usually) holds an up-to-date list of all website URLs. In most cases this list should be enough when designing a crawler and you wouldn’t need to crawl manually the website and create a list of URLs. Sitemap parser: Sitemapper Sitemapper is a well-maintained and well-documented, open-source library offering the following features: Follows redirects Supports gzip sitemaps...

Go to article

Tag: crawl

Parsing web sitemaps using JavaScript

Menu

Subscribe

Recent Posts

Panagiotis Tzamtzis