I had a scan through the search here but couldn’t find any reference to this so apologies if it’s been mentioned before.
So I, like many others, employ Mod Rewrite to make my .html extensions dissappear. As a result I prefer to tell google my non-html URL for indexing. The excellent sitemap generation tool obviously creates the sitemap with the extensions intact. Not actually a problem as it’s a simple matter to remove the extensions manually on the server … but it is another job to do. Especially if you re-upload the sparkle site and it regenerates the sitemap.
So, as a suggestion, it would be nice to have the option to ‘remove the extension for clean URL’. in the search engine settings. Unless …of coursde…i’m missing something silly.
having clean URLs is a common need but there are a number of things that need to go right for that to work. Since mod_rewrite is has a fairly complex syntax and a misconfiguration can easily break your site, we can’t really rely on that. It’s also essentially impossible to parse an existing .htaccess or server config to determine what it does. To top it off, this is only for apache servers, which while overwhelmingly popular and constituting the majority of web servers, it’s still not universal. Another downside is Sparkle will generate internal links with the trailing .html, so you either edit all page files, or add a redirect on the server.
This is why we suggest placing a page in a folder, and naming the file index.html, so that with an URL such as /something.html the clean variant will be /something/, and /something will work as well (with a redirect). This will work perfectly with Sparkle’s internal link and sitemap.xml generation.
Another option which is a little cleaner and faster, but requires a server config, is to strip the .html from the page file name in Sparkle. So now Sparkle knows the pages to be named /something, and will generate links and sitemap.xml accordingly. The server uses the .html file extension to determine the content type, i.e. serve the file as text/html, and most servers default to text/plain or worse application/octet-stream if an extension is not present, throwing off browsers.
Absolutely understood about the complexities of configuration for Mod rewrite but this doesn’t only apply to Apache, windows servers and IIS have their own rewrite modules that, although different, essentially do the same thing, allowing clean URL’s.
My thoughts lie with the fact that the web designer has already configured the hosting files and underlying technology and understands the mechanism to rewrite clean URLs server side. Sparkle can then be employed to create and publish the normal .html. In this case, the option to simply remove the html extension in sitemap creation would be a welcome one (on the understanding the designer knows the implications).
Your suggestion of going down the folder/index route is of course great. I have played with this and it works well but some developers are stuck in their ways and prefer other methods lol.
My affair with Sparkle has grown to the extent that ive decided to completely do away with my windows vps and go with the less time consuming apache cloud hosting route. I’ve achieved so much in such a short time compared to windows.
I get it but the DIY aspect of mod_rewrite doesn’t work well with Sparkle’s attempt at presenting a uniform, consistent interface for all setups. Embracing mod_rewrite is possible or even inevitable, but as mentioned not in the form of adopting an existing configuration (due to the infinite variations you can use on regular expressions etc), but more likely producing a known working one.
Also as mentioned while you might fix the sitemap.xml to remove the extension, internal links still have the extension, and fixing that is a lot more editing.