Most SEOs know that there are times when you need to do a lot of tedious work. They use powerful tools whenever they need to get the data quickly to accomplish tasks at hand. However, the common problem in SEO is that what works for one website doesn’t work for the other. Almost every page belonging to every website is constructed differently. While some are built really well, others are constructed terribly.
There can never be a ‘one size fits all’ solution in SEO and this is why automation can never be perfect. When it comes to fetching data from web pages, things can get complicated as everything is marked up differently. This is where XPath can prove to be useful. It comes handy as a solution that helps SEOs create their tools to make their tasks more efficient.
In this post, we will discuss what XPath is and how it can make the lives of SEOs easier.
XPath SEO Guide – UPDATED 2019
XML Path Language, or XPath, is a query language to select segments of information in an XML document which you want to pull data from.
Most SEOs consider the internet as a massive database, with information on almost all topics in the world, but of course, the internet is not a database, but a collection of web pages, often structured very differently from one another, and this complex mixture of different frameworks, designs and structures, make it really hard to collect data across different mark-ups.
Basically, XPath is very helpful to create schemas for different websites by selecting an element of each site, and will continue to gather data, unless that website’s architecture undergoes major changes.
XPath Cheat Sheet For SEOs
The cheat sheet mentioned below contains most functions necessary for SEOs, and should serve well in all types of IM related scraping, but keep in mind that its features and utility does not stop here, it has numerous functions and expressions, which can be very useful for skilled professionals.
You don’t have to break your heads trying to understand all aspects of this right away, so this cheat sheet is a condensed version for you to get started.
Thing | XPath |
Page Title | //title |
Meta Description | //meta[@name=’description’]/@content |
AMP URL | //link[@rel=’amphtml’]/@href |
Canonical URL | //link[@rel=’canonical’]/@href |
Robots (Index/Noindex) | //meta[@name=’robots’]/@content |
H1 | //h1 |
H2 | //h2 |
H3 | //h3 |
All Links in the Document | //@href |
Finds any element with a class named ‘any’ | //*[@class=’any’] |
Grabs hreflang attribute values | //link[@rel=’alternate’]/@hreflang |
XPath Title – An Example
The page is one of the most important aspects while scraping websites, and we’ll be guiding you through doing the same, but keep in mind that all other attributes and elements of the page can be extracted similarly, using the XPath from the above-mentioned cheat sheet.
Use: //Title
This tag should extract all the title attributes of a particular page, including the CSS and Javascript references.
You can use this tag for other elements as well.
For a user who knows what he or she is doing, XPath can be an incredibly powerful tool, and it isn’t too sophisticated, anyone can get a hang of it by playing with it for a few hours. If you’re stuck, there are 1000s of online developer forums that can help you out!