As we move away from a simple stack of PostGIS/GeoServer/GeoWebCache/Openlayers to wrapping a MapFish print service into the stack, it’s time to think more seriously about optimizing and stabilizing GeoServer.
In preparation for this step, I’ve been setting up a series of VMWare ESX-hosted Debian Linux VMs to function as the cluster of geospatial services.
Fortunately for me, there’s plenty of great advice in Andre Aime’s 2009 Foss4G presentation GeoServer in Production. Here’s what I gleaned from the presentation (any mistakes are undoubtedly mine and not Aime’s), plus a little bit of expansion from me:
1) Control the requests coming into the system. In this case, Andre talks about application container requests, limiting, e.g. Tomcat concurrent requests to 20 instead of the default 200:
2) Set up a high availability (HA) cluster. There are lots of ways to skin this beast, but a cheap and easy way is via the Ultimate Cheapskate Cluster. In Aime’s presentation, this is using vrrpd + balance, but with the current option of using “Pen“, stateful protocols like WFS-Transactional should be supported in addition to WMS.
3) Set up your java virtual machines intelligently. Most of this information get’s covered in GeoServer’s documentation page Running in a Production Environment. Additions from Andre’s presentation which might still be relevant are the following JVM flags:
If you use the second one, the JVM will use experimental optimizations, so test for stability before using this in a production environment. The first one notifies the virtual machine that there will be many temporary objects.
(FYI, for the nubes like me out there– JVM flags for Tomcat are set in the Catalina.sh startup script.)
3a) This get’s a special subheading, ’cause I couldn’t figure out why my WMS rendering was slow and unstable when I switched from Windows to Linux: Install and Use JAI & JAI Image I/O.
4) Finally, make sure your data are structured properly. If it’s really big imagery (>2GB), use an Image Pyramid, but otherwise, take advantage of internal tiling and overviews. Examples from gdal’s utilities include
gdal_translate -of GTiff -co "TILED=YES" utm.tif utm_tiled.tif
which creates internal tiling, and
gdaladdo -r average utm_tiled.tif 2 4 8 16 32 64 128 256
which adds overviews. You might also look to optimize the size of internal tiling.
For vector data, use PostGIS (not shapefiles), and index on your geometry and any attributes that are used in your SLD as filters. Also, show simple symbology when “zoomed out”, and reserve the complex rules for closer zoom levels.
An alternative that Aime doesn’t mention is that for really complicated data, you can do additional optimization. You can create generalized geometry columns as alternate columns. This is the vector equivalent of overviews. The SLD can then be coded to use the alternate simplified geometry at coarser scales (see e.g. this post for info on how to specify the geometry column in an SLD). I wish I could find the GeoServer post that originally advocated this technique… .
Hopefully this helps stabilize, optimize, and increase the availability of your GeoServer instance. Hopefully it does so for mine as well… .
4 thoughts on “GeoServer Optimization”
I found that “-XX:+AggressiveOpt” is an unrecognized option in my JRE 1.6.0_23.
Another bit of info on configuration of the ultimate cheapskate cluster can be found here:
PostGIS vs SHP+GWC – is really a difference in performance?
Depends. If the data are quite static, shp + GWC is a fine option for many use cases.