Here's my impression of some of the talks at the FLOSS Spring conference 2013. There were more talks than this but, present company excepted, these are the ones which made an impression on me, and during which I took notes! The unofficial theme of the conference this year was monitoring. The unspoken starting point of most of the monitoring talks seemed to be that Nagios doesn't scale at all well. Presenters of monitoring talks seemed to be competing to emphasise just how well their solutions will scale to practically infinite amounts of monitoring. John Hackett of ByteMark gave a talk on Custodian and MauveAlert. Custodian is a monitoring system and MauveAlert is a notification system which can be used with it. Both were developed at ByteMark and can be downloaded from projects.bytemark.co.uk. The software is written in Ruby and the configuration files are parsed as Ruby, so this might be good for Ruby enthusiasts. Ansible Jan-Piet Mens, Germany – A short introduction to Ansible Ansible is a stateless, daemonless configuration system making heavy use of ssh. It seems simple, elegant and increasingly popular. http://ansible.cc/ Icinga Bernd Erk, Netways GMbH, Germany - ICINGA - Open Source Monitoring to the Next Level The slides are at https://blog.netways.de/wp-content/uploads/2013/03/icinga_flossuk_2013.pdf https://www.icinga.org https://en.wikipedia.org/wiki/Icinga Bernd talked at last year's conference about Icinga too - last year's slides are at http://blog.netways.de/wp-content/uploads/2012/03/icinga_dwtw_flossuk_20121.pdf This year's talk was just as impressive. Icinga forked from Nagios in 2009. With Icinga 1 they've tried hard to preserve Nagios compatibility while improving on Nagios. (And by the way apparently Nagios has now changed to an "open source but you can't fork it" licence!) In parallel with Icinga 1, Icinga 2 is a complete rewrite and is a lot more modular and distributed. A note about pronunciation: although Bernd pronounced it "ee-SING-ga", Wikipedia claims that it's a Zulu word which is pronounced with a click consonant. Good luck. Jeff Gehlbach - OpenNMS "OpenNMS is the world’s first enterprise grade network management application platform developed under the open source model." It can handle thousands of network events per second; it discovers what's on the network, decides what's important and alerts you when it loses contact with it; or you can configure it yourself. OpenNMS seems to be a little like Best Practical/RT or RedHat - the software is open source and is given away for free but there's a company behind it which makes money from consulting, help and custom solutions. The software is at http://www.opennms.org and the company is at http://www.opennms.com. Most of the monitoring talks touched on SNMP at one point, mostly to comment on how difficult it was to use in practice, however good it might be in theory. Jeff Gehlbach disagreed. He said to use SNMP where you can as it scales well and is well standardised and controlled. NetSNMP multiplies the power of OpenNMS, he said. This talk featured the best picture of the conference: http://www.yiyinglu.com/failwhale/images/FWPA.jpg Monitoring at Scale - Aaron Brady, iWeb This was a very enjoyable talk, and thought-provoking too. iWeb is a small hosting company. It's been through several different monitoring solutions: They started off with no monitoring at all. They then starting using pingdom.com. They moved on to using SNMP, but found it awkward. Since the company was small enough that all technical staff worked in the same room, at this point they got serious about their monitoring: they bought a large TV screen and a disco light for the office and used them to summarise the current aggregate health of their servers in one of five levels, "Defcon One" to "Defcon Five", with changes in level being announced by a synthesised macho American voice. They also started using Nagios and Munin. This was the beginning, Aaron said, of a four year text onslaught. They then started configuring hosts with Puppet and Django. They moved from Nagios to Icinga so that they could move most of their monitoring from (resource-swallowing) active checks to passive checks. To this they added their own notification system called Retcon (an under-the-radar Dr Who reference, I'm told) and collectd. He highly recommends collectd; it's a non-forking non-threaded C program which therefore doesn't grow to swallow your computer whole. It collects monitoring data extremely efficiently and can do it in ten second quanta rather than the more normal-for-Nagios 5 minutes or more. iWeb had reached the point where the staff could be notified of any problem on any of their hundreds of servers within a few seconds of it happening. This exposed lots of problems which hadn't been evident from the previous 5 minute quanta system. In theory the new system enabled them to offer their customers excellent service, but in practice things didn't work out that way: Firstly the staff quickly wilted under the intolerable load of text messages, and secondly a lot of the problems turned out to be transient, solving themselves within a minute or two without intervention. Talking frankly with their customers, they found to their surprise that the customers had a far more relaxed view of what outage was acceptable than they themselves had had. After this revelation they were able to scale back their alert system until it was almost entirely silent, just alerting them to a few extremely important issues. The text onslaught was over. However, they retained the underlying 10 second quanta monitoring system because it could produce extremely useful and illuminating graphs for them. Jan-Piet Mens, Germany - Multiple choice: DNS servers Jan-Piet Mens is a great speaker. I went to this talk just because he was presenting it. The talk summarised the current choice of DNS servers available - which ones are best for which kinds of DNS serving jobs at what sort of scale. As well as rattling through the characteristics of several dozen DNS servers, he had two memorable things to say: If you're happy enough with your current DNS server, don't change it; and "DNS is easy, but DNS is complicated". He's written a 700+ page book about DNS servers and how to deploy them, and he's giving it away free at http://mens.de/:/book . For DNS news he recommends the news aggregator site dnssexy.net. I recommend his entertaining twitter feed at http://twitter.com/jpmens. Bernd Erk, Netways GMbH, Germany - Managing Enterprise Clouds with OpenNebula The slides are at https://blog.netways.de/wp-content/uploads/2013/03/opennebula_flossuk_2013.pdf http://opennebula.org OpenNebula is "an enterprise-ready open-source platform for sysadmins and devops to manage cloud data centers". I'd summarise it as "kvmtool on steroids". (Although it supports Xen and VMware as well as KVM.) OpenNebula is particularly impressive when used with Ceph (https://en.wikipedia.org/wiki/Ceph_%28storage%29, http://ceph.com/) which is a self-managing, self-healing, self-optimising distributed object store and filesystem. If some Ceph storage goes down, data isn't lost - Ceph simply does some extra replication of the lost content on the remaining storage to bring the replication factor back up to a safe level. It seems rather like Hadoop's HDFS, only more flexible and intelligent. Using Ceph, you can put _all_ of your VM images into one single pool which is shared between all of your VM servers. Since the VM image never needs to move, migration with OpenNebula then becomes a matter just of migrating the currently running process from one machine to another - a matter of a few seconds. Migrated VMs also don't need to reboot afterwards, since they haven't moved to a different storage pool. If a VM server goes down, OpenNebula can notice and boot its VMs up on another server for you automatically. VMs and their servers can be organised into clusters, virtual associations and groups, giving lots of opportunities for controlling servers in distinct ways, so for instance our Forum-based VMs could automatically migrate between any Forum-based VM servers as needed but could be protected from being migrated to other sites. OpenNebula SunStone is a GUI web front end for OpenNebula management, should you want one. http://opennebula.org/documentation:archives:rel2.2:sunstone Storage Caching - Tim Fletcher, Brighter Connections. This talk was subtitled "Why your SAN is slow", and it was a review of ways to speed up a SAN. Tim was a good speaker and seemed to really know what he was talking about. These are some notes I made during the talk: SANs are cheap, he said, but can give bad performance as multiple competing workloads can make disks thrash. To make them faster you can add more spindles or more ram or both. There's also a software approach, which is to use a copy-on-write filesystem such as ZFS. With ZFS when a file is changed the old version isn't deleted, a new copy is simply added to the filesystem tree, which is quicker. Disks are good at linear access, but bad at random access, and they're power-hungry and fragile. Memory is fast and good at random access but is expensive, volatile and once again power-hungry. Flash is fast, non-volatile and low power. However it's expensive, can wear out and has firmware - which means that it has firmware bugs. Flash operates either in PCIe mode or in hard disk emulation mode. Fusion IO flash is expensive but extremely good. There are three different write caching methods to look for - write round, write through and write later. A couple of kernel modules help with SSD caching - Flashcache was written by Facebook. It's in Ubuntu 13.4 and may appear in RHEL 7. It uses write later caching. This can give the fastest performance but the delayed write makes it easier to shoot yourself in the foot. Bcache is better for full systems. Currently it needs a full kernel rebuild and it needs devices to be rebuilt as "bcache" devices. It's very low level and it's faster than Flashcache. This module is heading towards inclusion in the mainline kernel. For me a valuable aspect of the conference was the opportunity it gave to spend an evening or two meeting and socialising with people from other parts of Edinburgh University. That's already coming in handy.