Problem 1: Pagination
You have a tonne of pages, e.g listing products in a particular category, which are completely identical asides from which subset of your products are listed on them. Now you want to ensure that you rank, but you also do not want to run into duplicate content issues, or waste your crawl budget letting Google crawl hundreds of pages which add no value. However, maybe you do want them to be crawled to ensure the content is indexed? Maybe you want them crawled but are aware these same products are listed in different groups and sequences in varying categories and you are worried about the implications of this. If only you could just tell Google: “Hey! These pages are paginated listings, so please treat them accordingly!”.
This is a common scenario for many sites, especially eCommerce sites; however, whilst a common problem, it is still something we see many clients struggling with. Furthermore, often clients will have some specific site quirk or preference which makes this less straightforward than it should be.
Problem 2: Page structure
For years now we have been reminded over and over to focus on semantic HTML. Originally the focus on this was that it makes rendering content across devices and formats far easier when it is neatly categorised: HTML for content and meaning, CSS for presentation and style, and Javascript for additional behaviour. Removing anything in your HTML that was just there for presentation was not too difficult, but managing to fully define the meaning of the content with HTML was pretty much impossible - HTML simply wasn’t a rich enough language. Microformats started flooding in trying to fill some of the gaps, but the fact is that HTML remained ill equipped for the task.
Problem 3: Internal search pages
What happens if you Google Bing’s results page for Googling Bing? Well, nothing actually because they block it with robots.txt, but my point is when a search engine starts crawling another search engine’s results pages the universe gets uneasy.
Now if you have an internal search feature on your site, the standard answer would be to block it with robots.txt and stop the hellish nightmare that can otherwise ensue. However, some sites actually blend the search feature with weird navigation systems or even use the search results as a way to list certain product categories that they then link to. The best solution is to fix the site IA and make this a non-issue but it isn’t always as easy as it should be.
Problem 4: Microformats != schema.org
Microformats and RDFa are two forms of embedding machine readable meta data into our web page that are both quite well known in the SEO community.
Microdata is another such format, and is part of the HTML5 spec, but has remained somewhat in the shadows and hasn’t seen the widespread adoption of the others.
Schema.org is not a format or a language in itself, but it actually a vocabulary which the search engines have all agreed to understand and respect. It lays out what types of entities and attributes you can insert into the metadata on your web pages and guarantees that all the engines will understand these.
Problem 5: AJAX and URLs
This one is well known and disliked by pretty much every SEO that there ever was. AJAX sites are really nice for users and improve the user experience greatly.... right up to the moment the user tries to bookmark the page they are on, or email it someone, or share it via social media, or use the back button, or find the page in their history the next day.
AJAX and SEO simply were never designed to mix, and now we are in a world where people want both. If you have somehow managed to avoid this problem and aren’t aware of is then I’ll briefly outline it... AJAX allows a webpage to, via the use of javascript, update the contents of a page without actually reloading the page; a new HTTP request will be sent and the new content will probably replace some old content on the page but because the page does not reload the URL does not change.
The traditional method to address this to ensure the Googlebot can spider the content is simply to ensure the AJAX calls are hooked to traditional <a> tags so you can include an href to a version of that same content which Google will pick up (and far too often even this hasn’t been done - meaning the content is stranded and will never get indexed). This is fine for the crawling aspect of SEO, but nowadays we need to consider the fact that social shares are an important aspect of SEO too and if the user can’t copy and paste the correct URL then you are already handicapped.
You have a tonne of pages, e.g listing products in a particular category, which are completely identical asides from which subset of your products are listed on them. Now you want to ensure that you rank, but you also do not want to run into duplicate content issues, or waste your crawl budget letting Google crawl hundreds of pages which add no value. However, maybe you do want them to be crawled to ensure the content is indexed? Maybe you want them crawled but are aware these same products are listed in different groups and sequences in varying categories and you are worried about the implications of this. If only you could just tell Google: “Hey! These pages are paginated listings, so please treat them accordingly!”.
This is a common scenario for many sites, especially eCommerce sites; however, whilst a common problem, it is still something we see many clients struggling with. Furthermore, often clients will have some specific site quirk or preference which makes this less straightforward than it should be.
Problem 2: Page structure
For years now we have been reminded over and over to focus on semantic HTML. Originally the focus on this was that it makes rendering content across devices and formats far easier when it is neatly categorised: HTML for content and meaning, CSS for presentation and style, and Javascript for additional behaviour. Removing anything in your HTML that was just there for presentation was not too difficult, but managing to fully define the meaning of the content with HTML was pretty much impossible - HTML simply wasn’t a rich enough language. Microformats started flooding in trying to fill some of the gaps, but the fact is that HTML remained ill equipped for the task.
Problem 3: Internal search pages
What happens if you Google Bing’s results page for Googling Bing? Well, nothing actually because they block it with robots.txt, but my point is when a search engine starts crawling another search engine’s results pages the universe gets uneasy.
Now if you have an internal search feature on your site, the standard answer would be to block it with robots.txt and stop the hellish nightmare that can otherwise ensue. However, some sites actually blend the search feature with weird navigation systems or even use the search results as a way to list certain product categories that they then link to. The best solution is to fix the site IA and make this a non-issue but it isn’t always as easy as it should be.
Problem 4: Microformats != schema.org
Microformats and RDFa are two forms of embedding machine readable meta data into our web page that are both quite well known in the SEO community.
Microdata is another such format, and is part of the HTML5 spec, but has remained somewhat in the shadows and hasn’t seen the widespread adoption of the others.
Schema.org is not a format or a language in itself, but it actually a vocabulary which the search engines have all agreed to understand and respect. It lays out what types of entities and attributes you can insert into the metadata on your web pages and guarantees that all the engines will understand these.
Problem 5: AJAX and URLs
This one is well known and disliked by pretty much every SEO that there ever was. AJAX sites are really nice for users and improve the user experience greatly.... right up to the moment the user tries to bookmark the page they are on, or email it someone, or share it via social media, or use the back button, or find the page in their history the next day.
AJAX and SEO simply were never designed to mix, and now we are in a world where people want both. If you have somehow managed to avoid this problem and aren’t aware of is then I’ll briefly outline it... AJAX allows a webpage to, via the use of javascript, update the contents of a page without actually reloading the page; a new HTTP request will be sent and the new content will probably replace some old content on the page but because the page does not reload the URL does not change.
The traditional method to address this to ensure the Googlebot can spider the content is simply to ensure the AJAX calls are hooked to traditional <a> tags so you can include an href to a version of that same content which Google will pick up (and far too often even this hasn’t been done - meaning the content is stranded and will never get indexed). This is fine for the crawling aspect of SEO, but nowadays we need to consider the fact that social shares are an important aspect of SEO too and if the user can’t copy and paste the correct URL then you are already handicapped.
Best content is in this site.Thanks for sharing for great resources HTML5.
ReplyDeleteHTML5 Development Company Delhi