PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Saturday, November 12, 2022

[FIXED] How to efficiently serve massive sitemaps in django

 November 12, 2022     caching, django, memcached, sitemap     No comments   

Issue

I have a site with about 150K pages in its sitemap. I'm using the sitemap index generator to make the sitemaps, but really, I need a way of caching it, because building the 150 sitemaps of 1,000 links each is brutal on my server.[1]

I COULD cache each of these sitemap pages with memcached, which is what I'm using elsewhere on the site...however, this is so many sitemaps that it would completely fill memcached....so that doesn't work.

What I think I need is a way to use the database as the cache for these, and to only generate them when there are changes to them (which as a result of the sitemap index means only changing the latest couple of sitemap pages, since the rest are always the same.)[2] But, as near as I can tell, I can only use one cache backend with django.

How can I have these sitemaps ready for when Google comes-a-crawlin' without killing my database or memcached?

Any thoughts?

[1] I've limited it to 1,000 links per sitemap page because generating the max, 50,000 links, just wasn't happening.

[2] for example, if I have sitemap.xml?page=1, page=2...sitemap.xml?page=50, I only really need to change sitemap.xml?page=50 until it is full with 1,000 links, then I can it pretty much forever, and focus on page 51 until it's full, cache it forever, etc.

EDIT, 2012-05-12: This has continued to be a problem, and I finally ditched Django's sitemap framework after using it with a file cache for about a year. Instead I'm now using Solr to generate the links I need in a really simple view, and I'm then passing them off to the Django template. This greatly simplified my sitemaps, made them perform just fine, and I'm up to about 2,250,000 links as of now. If you want to do that, just check out the sitemap template - it's all really obvious from there. You can see the code for this here: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/src/tip/alert/casepage/sitemap.py


Solution

I had a similar issue and decided to use django to write the sitemap files to disk in the static media and have the webserver serve them. I made the call to regenerate the sitemap every couple of hours since my content wasn't changing more often than that. But it will depend on your content how often you need to write the files.

I used a django custom command with a cron job, but curl with a cron job is easier.

Here's how I use curl, and I have apache send /sitemap.xml as a static file, not through django:

curl -o /path/sitemap.xml http://example.com/generate/sitemap.xml


Answered By - dar
Answer Checked By - Clifford M. (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing