Thomas Dudziak's Blog

Storm & Esper

with 24 comments

At work, we recently started using Esper for realtime analytics, and so far we quite like Esper. It is a great tool at what it does – running queries continuously over data. The problem however then becomes how to get data into Esper. The recently released Storm could be one way to do that, so I got curios and started playing around with it to see if it could be made to work with Esper. And it turns out, the integration is straightforward.

Some Storm basics

Storm has three basic concepts that are relevant in this context: streams, spouts, and bolts. At the core, Storm facilitates data transfer between spouts and bolts using streams of tuples.

Spouts are the basic data emitters, typically retrieving the data from outside of the Storm cluster. A simple example of this would be a spout that retrieves the tweet stream via the Twitter API and emits the tweets as a stream into the Storm cluster.

Bolts are data processors that receive one or more streams as input and potentially also emit (processed) data on one or more streams. In the twitter example, one could for instance imagine bolts that count the number of tweets per second, or detect the language of the tweet and reemit the tweets into per-language streams.

The data in the streams has a simple tuple form consisting of a fixed number of named values called fields. Storm does not care about the data types of the individual fields in the tuple as long as they can be serialized to the wire format (which is Thrift), whether via serializers provided by Storm or custom ones. Spouts and bolts need to declare the number of fields and their names for each of the tuples they are going to emit as part of the initial setup of the topology. This also means that the number of fields and their names are fixed for the duration of a Storm ‘session’.

Some Eper basics

Esper is, and I’m simplifying things quite a bit here, a processing engine for data streams that uses queries run on the data streams to processes them. Think of it as a way to run SQL-like queries on data that streams by. The queries run continuously and thus have a time or amount-of-data aspect to them. Continuing the twitter example from above, if we consider the never-ending stream of tweets as the data stream that Esper works with, then an Esper query could for instance return the number of tweets per second like so:

select count(*) as tps from Twitter.win:time_batch(1 sec)

The time_batch part in this example will direct Esper to apply the count function on 1-sec batches of events.

Esper data streams consist of structured data called events. The types of these events can be POJOs, maps, and other things. Events are typically registered with Esper in advance before submitting a query to Esper. This means that you have to tell Esper about which kind of event type you give it (java class, map, …) and which properties the event type has. For java classes, Esper can figure that out itself but for map events you need to tell Esper explicitly about the possible keys and the value data types. Fortunately, Esper is forgiving when it comes to the data types. You can tell it that you’ll give it Objects, and it will happily accept numbers in the actual data stream and perform numeric operations on them.

How to combine the two

Storm’s tuples are quite similar to Esper’s map event types. The tuple field names map naturally to map keys and the field values to values for these keys. The tuple fields are not typed when they are defined, but that does not pose a big problem for us as we can simply tell Esper that they are of type Object. In addition, the fact that tuples have to be defined before a topology is run, makes it relatively easy for us to define the map event type in the setup phase.

I am going to use the twitter stream example from the storm-starter project to show how you can use Esper to count the number of tweets per second and also find the maximum number of retweets per 1 second interval. This is probably not of great practical use, but will show off some aspects of the Storm – Esper integration.

An up-to-date version of this code is available on GitHub.

Let’s get started with the twitter spout, a slightly adapted version of the one from the storm-starter project:

public class TwitterSpout implements IRichSpout, StatusListener {
    private static final long serialVersionUID = 1L;

    private final String username;
    private final String pwd;
    private transient BlockingQueue<Status> queue;
    private transient SpoutOutputCollector collector;
    private transient TwitterStream twitterStream;

    public TwitterSpout(String username, String pwd) {
        this.username = username;
        this.pwd = pwd;
    }

    @Override
    public boolean isDistributed() {
        return false;
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("createdAt", "retweetCount"));
    }

    @Override
    public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
        this.queue = new ArrayBlockingQueue<Status>(1000);
        this.collector = collector;

        Configuration twitterConf = new ConfigurationBuilder().setUser(username)
                                                              .setPassword(pwd)
                                                              .build();
        TwitterStreamFactory fact = new TwitterStreamFactory(twitterConf);

        twitterStream = fact.getInstance();
        twitterStream.addListener(this);
        twitterStream.sample();
    }

    @Override
    public void onStatus(Status status) {
        queue.offer(status);
    }

    @Override
    public void nextTuple() {
        Status value = queue.poll();
        if (value == null) {
            Utils.sleep(50);
        }
        else {
            collector.emit(tuple(value.getCreatedAt().getTime(),
                                 value.getRetweetCount()));
        }
    }

    @Override
    public void close() {
        twitterStream.shutdown();
    }

    @Override
    public void ack(Object arg0) {}
    @Override
    public void fail(Object arg0) {}
    @Override
    public void onException(Exception ex) {}
    @Override
    public void onDeletionNotice(StatusDeletionNotice statusDeletionNotice) {}
    @Override
    public void onTrackLimitationNotice(int numberOfLimitedStatuses) {}
    @Override
    public void onScrubGeo(long userId, long upToStatusId) {}
}

This defines a spout that emits a single stream of tuples with two fields, createdAt (a timestamp) and retweetCount (an integer).

You’ll notice that aside from the twitter username and password, all fields in the spout are marked as transient, and initialized in the open method. The reason for this is that Storm requires spouts and bolts to be serializable so it can move them to some node in the Storm cluster before starting the topology.

The Esper bolt itself is generic. You pass it esper statements and the names of the output fields which will be generated by these esper statements. The adapted main method for our Twitter example looks like this:

public static void main(String[] args) {
    final String username = args[0];
    final String pwd = args[1];

    TopologyBuilder builder = new TopologyBuilder();
    TwitterSpout spout = new TwitterSpout(username, pwd);
    EsperBolt bolt = new EsperBolt(
        new Fields("tps", "maxRetweets"),
        "select count(*) as tps, max(retweetCount) as maxRetweets from Storm.win:time_batch(1 sec)");

    builder.setSpout(1, spout);
    builder.setBolt(2, bolt).shuffleGrouping(1);

    Config conf = new Config();
    conf.setDebug(true);

    LocalCluster cluster = new LocalCluster();

    cluster.submitTopology("test", conf, builder.createTopology());
    Utils.sleep(10000);
    cluster.shutdown();
}

Note how the Esper statement returns tps and maxRetweets which are also declared as the two output fields for the bolt.

The bolt code itself (see a version of this that is kept up-to-date with Storm here) consists of three pieces. The setup part constructs map event types for each input stream and registers them with Esper (I omitted the Esper setup code):

private void setupEventTypes(TopologyContext context, Configuration configuration) {
    Set<GlobalStreamId> sourceIds = context.getThisSources().keySet();
    singleEventType = (sourceIds.size() == 1);

    for (GlobalStreamId id : sourceIds) {
        Map<String, Object> props = new LinkedHashMap<String, Object>();

        setupEventTypeProperties(
            context.getComponentOutputFields(id.get_componentId(),
                                             id.get_streamId()),
                                             props);
        configuration.addEventType(getEventTypeName(id.get_componentId(),
                                                    id.get_streamId()),
                                                    props);
    }
}

private String getEventTypeName(int componentId, int streamId) {
    if (singleEventType) {
        return "Storm";
    }
    else {
        return String.format("Storm_%d_%d", componentId, streamId);
    }
}

private void setupEventTypeProperties(Fields fields, Map<String, Object> properties){
    int numFields = fields.size();

    for (int idx = 0; idx < numFields; idx++) {
        properties.put(fields.get(idx), Object.class);
    }
}

The field-to-property mapping is straightforward. It simply registers properties of type Object using the field names in the event type corresponding to the input stream. If the bolt only has a single input stream, then it registers a single event type called Storm. For multiple types, it uses the component id (id of the spout or bolt that the data comes from) and the stream id (spouts and bolts can emit multiple streams) to generate a name Storm_{component id}_{stream id}.

The second part is the transfer of data from Storm to Esper:

@Override
public void execute(Tuple tuple) {
    String eventType = getEventTypeName(tuple.getSourceComponent(),
                                        tuple.getSourceStreamId());
    Map<String, Object> data = new HashMap<String, Object>();
    Fields fields = tuple.getFields();
    int numFields = fields.size();

    for (int idx = 0; idx < numFields; idx++) {
        String name = fields.get(idx);
        Object value = tuple.getValue(idx);

        data.put(name, value);
    }

    runtime.sendEvent(data, eventType);
}

This method is called by Storm whenever a tuple from any of the connected streams is sent to the bolt. The code therefore first has to find the event type name corresponding to the tuple. Then it iterates over the fields in the tuple and puts the values into a map using the field names as the keys. Finally, it passes that map to Esper.

At this moment, Esper will route this map (the event) through the statements which in turn might produce new data that we need to hand back to Storm. For this purpose, the bolt registered itself as a listener for data emitted from any of the statements that we configured during the setup. Esper will then call back the update method on the bolt if one of the statements generated data. The update method will then basically perform the reverse operation of the execute method and convert the event data to a tuple:

@Override
public void update(EventBean[] newEvents, EventBean[] oldEvents) {
    if (newEvents != null) {
        for (EventBean newEvent : newEvents) {
            collector.emit(toTuple(newEvent));
        }
    }
}

private List<Object> toTuple(EventBean event) {
    int numFields = outputFields.size();
    List<Object> tuple = new ArrayList<Object>(numFields);

    for (int idx = 0; idx < numFields; idx++) {
        tuple.add(event.get(outputFields.get(idx)));
    }
    return tuple;
}

Written by tomdzk

September 28, 2011 at 9:12 pm

Posted in computers

Lion tweaks

leave a comment »

This is a quick summary of all the tweaks handed down from generation to generation (aka found on the Internet), that I applied to a fresh Lion install to get it into a usable state.

Turn firewall on

Go to System Preferences -> Security & Privacy, then click the lock. Now you can start the firewall via the Start button. Also, click on Advanced... and then check Enable stealth mode.

Turn off Autocorrect

Go to System Preferences > Language & Text > Text, then uncheck Correct spelling automatically.

Trackpad tweaks

Go to System Preferences -> Trackpad, then uncheck Scroll direction: natural.

Terminal tweaks

Start a new terminal, then open the preferences.

Theme

  • Select Pro, then click Default at the bottom.
  • On the left, click Change next to font and select 14pt font size.
  • Check Antialias text.
  • Uncheck Use bold fonts.

Dimensions

Go to the Window tab, then enter 120 in the Columns input field.

Keybindings

Go to the Keyboard tab. Select the first entry control cursor left, then click Edit at the bottom
Delete the last three characters via the Delete one character button, then add a single b (the result should be 33b). Do the same with control cursor right and f so that you get 33f. These two changes give you Control+Left cursor/Right cursor for going to the previous/next word (quite useful for SSH, for instance).

Check Use option as meta key.

Other stuff

On the Advanced tab, uncheck Audible bell and possibly Visual bell.

Use function keys instead of feature keys

Go to System Preferences -> Keyboard, then check Use all F1, F2, etc. keys as standard function keys.

Re-enable key repeat

In a terminal, enter this:

defaults write -g ApplePressAndHoldEnabled -bool false 

Then go to System Preferences -> Keyboard, then move the sliders for Key Repeat and Delay Until Repeat both to the rightmost setting.

Finally, restart the computer.

Disable the new window animation

In a terminal, enter this:

defaults write NSGlobalDomain NSAutomaticWindowAnimationsEnabled -bool NO 

Mission Control tweaks

Go to System Preferences -> Mission Control, then uncheck Show Dashboard as a space and Automatically rearrange spaces based on most recent use.

Go to System Preferences -> Keyboard -> Keyboard Shortcuts, then select Spotlight. Double click the key binding for Show Spotlight window and press Shift+Command+Space.

Now, select Mission Control. Double click the key binding for Mission Control and press Option+Command+Space.

Double click the key binding for Move left a space and then press Option+Command+Left cursor. Similarly, for Move right a space, use Option+Command+Right cursor.
This will enable Option-Left cursor/Right cursor for jumping between words (in addition to the Control-Left cursor/Right cursor that we set up above).

Written by tomdzk

July 30, 2011 at 9:30 pm

Posted in Uncategorized

Internal DNS in Amazon EC2 via tags

leave a comment »

The recent EC2 API update from August 31st introduced the concept of tags which allow us to attach somewhat arbitrary metadata to instances and other resources. One useful application that came to my mind is to use them for maintaining internal DNS. The two blog posts here and here describe how to do this using the name of the ssh key that was used to create the instance. However that means that a new ssh key has to be used for each instance which is cumbersome. Tags make this a lot easier, especially since they are automatically returned as part of the instance metadata.

The only problem was that Glenn Rempe’s AWS ruby library doesn’t support the new API version yet. So forked the library and updated it to support the new API version and added the necessary functions.

So assuming that your instances have a tag named “hostname” that contains the desired short hostname (e.g. db1), with this new gem, the script from the above two blog posts becomes:

#!/usr/bin/env ruby
%w(optparse rubygems AWS resolv pp).each { |l| require l }
options = {}
parser = OptionParser.new do |p|
  p.banner = "Usage: hosts [options]"
  p.on("-a", "--access-key USER", "The user's AWS access key ID.") do |aki|
    options[:access_key_id] = aki
  end
  p.on("-s",
       "--secret-key PASSWORD",
       "The user's AWS secret access key.") do |sak|
    options[:secret_access_key] = sak
  end
  p.on_tail("-h", "--help", "Show this message") {
    puts(p)
    exit
  }
  p.parse!(ARGV)
end
if options.key?(:access_key_id) and options.key?(:secret_access_key)
  puts "127.0.0.1 localhost"

  AWS::EC2::Base.new(options).describe_instances.reservationSet.item.each do |r|
    r.instancesSet.item.each do |i|
      if i.instanceState.name =~ /running/
        tagSet = i.tagSet
        if (!tagSet.nil? && !tagSet.item.nil?)
          tagSet.item.each do |hash|
            if hash.key == 'hostname'
              puts(Resolv::DNS.new.getaddress(
                i.privateDnsName).to_s +" #{hash.value}.ec2 #{hash.value}")
            end
          end
        end
      end
    end
  end
else
  puts(parser)
  exit(1)
end

P.S.: The documentation for the API can be found here.

Written by tomdzk

October 16, 2010 at 1:47 am

Posted in Uncategorized

How to install Scribe with HDFS support on Ubuntu Karmic

with 9 comments

Prerequisites

Install some pre-requisites (more might be needed, my system had a bunch of things already):

sudo apt-get install bison flex sun-java6-jdk ruby1.8-dev ant

Create a build folder

We won’t install scribe or thrift on the machine itself, instead keep it confined to a folder. For this we should

mkdir scribe-build
cd scribe-build
mkdir dist

The dist folder will contain the binary distribution of scribe once we’re done, including all libraries.

Install Boost

On Ubuntu, you can simply install boost via the package manager:

sudo apt-get install libboost1.40-dev libboost-filesystem1.40-dev

These are the only two parts of boost that are needed. Also, please make sure to get at least version 1.40.

If you want to install from source instead, download boost version 1.40 or newer from http://www.boost.org/ (current version is 1.41.0) and then unpack it into the scribe-build folder. After that, cd to the created folder and build it:

cd boost_1_41_0
./bootstrap.sh --prefix=`pwd`/../dist
./bjam install
cd ..

Install Libevent

Again, libevent can simply be installed via the package manager:

sudo apt-get install libevent-dev

On Karmic, this will install libevent (if not installed already) and libevent development files for version 1.4.11 or newer. If you want to install it from source, download the 1.4.x source distribution from http://www.monkey.org/~provos/libevent/ (1.4.13 is the current version) and unpack it into the scribe-build folder. Then cd into the generated folder and build it:

cd libevent-1.4.13-stable
./configure --prefix=`pwd`/../dist
make
make install
cd ..

Thrift and FB303

Download version 0.2.0-incubating from http://incubator.apache.org/thrift/download and unpack it into scribe-build. This should generate a folder scribe-build/thrift-0.2.0. To build it, run:

cd thrift-0.2.0
export PY_PREFIX=`pwd`/../dist
export JAVA_PREFIX=`pwd`/../dist
./configure --prefix=`pwd`/../dist \
    --with-boost=`pwd`/../dist \
    --with-libevent=`pwd`/../dist
make
make install
cd ..

This will most likely throw an error when trying to setup the ruby binding since it won’t be allowed to write into the system directory. This is due to a bug in the thrift build scripts – there is no way that I could find to tell it to install the ruby bindings locally. However, the things that we want will have been installed successfully, so let’s move on.

Next build the FB303 project:

cd contrib/fb303
export PY_PREFIX=`pwd`/../../../dist
./bootstrap.sh \
    --with-thriftpath=`pwd`/../../../dist \
    --with-boost=`pwd`/../../../dist \
    --prefix=`pwd`/../../../dist
make
make install
cd ../../..

Libhdfs

Scribe currently requires libhdfs 0.20.1 with patches applied – the stock version from the Hadoop 0.20.1 distribution won’t work. You can either use the Cloudera 0.20.1 distribution which has these patches applied, or use a newer version – presumably 0.21 works, but I haven’t tried it.

On Ubuntu, you can either install the Cloudera Hadoop distribution via debian packages, or you can compile it from source. The Debian/Ubuntu setup steps are described here:
http://archive.cloudera.com/docs/_apt.html.

We however are going to compile libhdfs from source to get an independent library. Download from
http://archive.cloudera.com/cdh/testing/hadoop-0.20.1+152.tar.gz
and unpack it into the scribe-build folder. This will create a hadoop-0.20.1+152 folder, so let’s go there:

cd hadoop-0.20.1+152

Unfortunately, we also need to tweak two files by adding this line

#include <stdint.h>

right under the existing

#include <stdint.h>

in these two files

src/c++/utils/api/hadoop/SerialUtils.hh
src/c++/pipes/api/hadoop/Pipes.hh

Once you’ve done that, run:

cd src/c++/libhdfs
./configure --enable-shared \
    JVM_ARCH=tune=k8 \
    --prefix=`pwd`/../../../../dist
make
make install
cd ../../../..

Note that this seems to have been fixed in the 0.20.1+168.89 cloudera release.

Build scribe

Download scribe 2.1 from http://github.com/facebook/scribe/downloads or clone the git repository (git://github.com/facebook/scribe.git). If you download the distribution, unpack it into the scribe-build directory, yielding a scribe-build/scribe- folder. cd to the scribe folder and the run:

cd scribe-2.1
export LD_LIBRARY_PATH="`pwd`/../dist/lib:"\
"/usr/lib/jvm/java-6-sun/jre/lib/amd64:"\
"/usr/lib/jvm/java-6-sun/jre/lib/amd64/server"
export CFLAGS="-I/usr/lib/jvm/java-6-sun/include/ "\
"-I/usr/lib/jvm/java-6-sun/include/linux/"
export LDFLAGS="-L`pwd`/../dist/lib "\
"-L/usr/lib/jvm/java-6-sun/jre/lib/amd64 "\
"-L/usr/lib/jvm/java-6-sun/jre/lib/amd64/server"
export LIBS="-lhdfs -ljvm"
./bootstrap.sh --enable-hdfs \
    --with-hadooppath=`pwd`/../dist \
    --with-boost=`pwd`/../dist \
    --with-thriftpath=`pwd`/../dist \
    --with-fb303path=`pwd`/../dist \
    --prefix=`pwd`/../dist
make
make install
cd ..

Adjust the jre/lib paths in the LDFLAGS to match your environment (e.g. 32bit vs. 64bit). The HDFS/Hadoop path in there is optional (i.e. enabled via the –enable-hdfs option) and only required if you want hdfs support.

Test that it works

Simply start scribe with the library path set correctly:

cd dist
export LD_LIBRARY_PATH="`pwd`/lib"
./bin/scribed ../scribe-2.1/examples/example1.conf

This should generate output like this:

[Tue Jan 19 00:31:07 2010] "STATUS: STARTING"
[Tue Jan 19 00:31:07 2010] "STATUS: configuring"
[Tue Jan 19 00:31:07 2010] "got configuration data from file "
[Tue Jan 19 00:31:07 2010] "CATEGORY : default"
[Tue Jan 19 00:31:07 2010] "Creating default store"
[Tue Jan 19 00:31:07 2010] "configured  stores"
[Tue Jan 19 00:31:07 2010] "STATUS: "
[Tue Jan 19 00:31:07 2010] "STATUS: ALIVE"
[Tue Jan 19 00:31:07 2010] "Starting scribe server on port 1463"

Written by tomdzk

January 19, 2010 at 12:32 am

Posted in Uncategorized

Setting up reconnoiter on Ubuntu (Karmic and newer)

with 9 comments

After it took me about 2 days to figure out how to setup reconnoiter, I figured, it would be nice to document the steps so that it will be much easier for other people.

Note: This guide was written for Karmic Koala (9.10) and Lucid Lynx (10.04). It should generally work for Jaunty, too, as well as other Linux distributions (minus the package manager instructions obviously).

Note: This guide has been updated to reconnoiter trunk revision 1404.

Before we begin, here are some useful links:

Reconnoiter home page: https://labs.omniti.com/trac/reconnoiter

Reconnoiter docs: http://labs.omniti.com/docs/reconnoiter/

Oscon demo: http://omniti.com/video/noit-oscon-demo

1. Build it

First, let’s install a bunch of things. In the Synaptic Package Manager under Settings -> Repositories -> Other Software enable the two entries for the partner repositories. Then

sudo apt-get install autoconf build-essential libtool gettext \
  zlib1g-dev uuid-dev libpcre3-dev libssl-dev libpq-dev \
  libxml2-dev libxslt-dev libapr1-dev libaprutil1-dev xsltproc \
  libncurses5-dev libssh2-1-dev libsnmp-dev libmysqlclient-dev \
  subversion sun-java6-jdk 

Now we check out reconnoiter from subversion and build it:

svn co https://labs.omniti.com/reconnoiter/trunk reconnoiter
cd reconnoiter
autoconf
./configure
make
sudo mkdir -p /usr/local/java/libmake
sudo make install

2. Setup the DB

We need PostgreSQL 8.4 server & client. On Karmic you get that via

sudo apt-get install postgresql postgresql-client

For Jaunty, follow the steps here.

Next, make sure that the postgresql config file allows local access without password. Edit the /etc/postgresql/8.4/main/pg_hba.conf to change the local entry to use “trust”:

local   all         all                               trust

After that, restart the postgresql server:

sudo /etc/init.d/postgresql-8.4 restart

Now log in into postgresql:

sudo su postgres
cd sql
psql

Within psql do

\i scaffolding.sql
\q

3. Setup cron

First, we need to change the crontab to point to where postgresql is actually installed:

exit
sed -i 's/\/opt\/psql835/usr/g' sql/crontab
sudo su postgres
cd sql

We also need to run the commands in the crontab at least once manually as they will initialize certain database structures. As the postgres user:

eval "`cat crontab | cut -d' ' -f6- | grep -v ^$ | awk '{print $0\";\"}'`"

Finally, and still as user postgres do

crontab crontab
exit

4. Setup the web ui

For configuring the web UI (PHP), we first need Apache2 and PHP:

sudo apt-get install apache2 libapache2-mod-php5 php5-pgsql

This will also enable mod_php5. Every other required module (mod_mime, mod_lib_config, mod_rewrite, mod_proxy, mod_proxy_http, mod_authz_host) should be already enabled or even compiled in the server (apache2 -l will show). To make sure that they are enabled, simply do

sudo a2enmod mime
sudo a2enmod rewrite
sudo a2enmod proxy
sudo a2enmod proxy_http
sudo a2enmod authz_host

Next, we need the apache configuration, either as a a new file /etc/apache2/sites-available/reconnoiter that then should be symlinked into /etc/apache2/sites-enabled, or in the current configuration (e.g. /etc/apache2/sites-enabled/000-default). A sample configuration to setup reconnoiter on port 80:

<VirtualHost *:80>
  ServerAdmin webmaster@localhost
  DocumentRoot @ROOT@/ui/web/htdocs

  <Directory "/">
      Options None
      AllowOverride None
      Order allow,deny
      Deny from all
  </Directory>
  <FilesMatch "^\.ht">
      Order allow,deny
      Deny from all
      Satisfy All
  </FilesMatch>
  <Directory "@ROOT@/ui/web/htdocs/">
      php_value include_path @ROOT@/ui/web/lib
      php_value short_open_tag off
      Options FollowSymLinks Indexes
      AllowOverride All
      Order deny,allow
      Allow from all
  </Directory>

  LogLevel warn
  LogFormat "%h %l %u %t \"%r\" %>s %b" common

  ErrorLog @ROOT@/ui/web/logs/error_log
  CustomLog @ROOT@/ui/web/logs/access_log common

  AddType application/x-compress .Z
  AddType application/x-gzip .gz .tgz
  AddType application/x-httpd-php .php
  DefaultType text/plain
</VirtualHost>

Replace @ROOT@ with the directory where you have installed reconnoiter.

If you chose to add reconnoiter to the Apache config on a different port than 80, say 9090, then you will also have to change Apache’s port configuration in /etc/apache2/ports.conf by adding:

NameVirtualHost *:9090
Listen 9090

Then restart apache:

sudo /etc/init.d/apache2 restart

5. Generate test certificates

These steps show how to generate test certificates. In a production environment you would of course use a real CA.

Create/go to a temporary directory:

mkdir ssh-keys
cd ssh-keys

Next create a file openssl.cnf file in it with this contents:

HOME = .
RANDFILE = $ENV::HOME/.rnd

oid_section = new_oids

[ new_oids ]

[ ca ]
default_ca = CA_default

[ CA_default ]
dir = ./testCA
certs = $dir/certs
crl_dir = $dir/crl
database = $dir/index.txt
new_certs_dir = $dir/newcerts
certificate = $dir/cacert.pem
serial = $dir/serial
crl = $dir/crl.pem
private_key = $dir/private/cakey.pem
RANDFILE = $dir/private/.rand
x509_extensions = usr_cert
name_opt = ca_default
cert_opt = ca_default
default_days = 365
default_crl_days = 30
default_md = md5
preserve = no
policy = policy_match

[ policy_match ]
countryName = match
stateOrProvinceName = match
organizationName = match
organizationalUnitName = optional
commonName = supplied
emailAddress = optional

[ policy_anything ]
countryName = optional
stateOrProvinceName = optional
localityName = optional
organizationName = optional
organizationalUnitName	= optional
commonName	 = supplied
emailAddress = optional

[ req ]
default_bits = 1024
default_keyfile = privkey.pem
distinguished_name = req_distinguished_name
attributes = req_attributes
x509_extensions = v3_ca
string_mask = nombstr

[ req_distinguished_name ]
countryName = Country Name (2 letter code)
countryName_default = AU
countryName_min = 2
countryName_max = 2
stateOrProvinceName = State or Province Name (full name)
stateOrProvinceName_default = Some-State
localityName = Locality Name (eg, city)
0.organizationName = Organization Name (eg, company)
0.organizationName_default = Internet Widgits Pty Ltd
organizationalUnitName = Organizational Unit Name (eg, section)
commonName = Common Name (eg, YOUR name)
commonName_max = 64
emailAddress = Email Address
emailAddress_max = 64

[ req_attributes ]
challengePassword = A challenge password
challengePassword_min	= 4
challengePassword_max = 20
unstructuredName = An optional company name

[ usr_cert ]
basicConstraints = CA:FALSE
nsComment = "OpenSSL Generated Certificate"
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid,issuer:always

[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment

[ v3_ca ]
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid:always,issuer:always
basicConstraints = CA:true

[ crl_ext ]
authorityKeyIdentifier = keyid:always,issuer:always

[ proxy_cert_ext ]
basicConstraints = CA:FALSE
nsComment = "OpenSSL Generated Certificate"
subjectKeyIdentifier = hash
authorityKeyIdentifier  = keyid,issuer:always
proxyCertInfo = critical,language:id-ppl-anyLanguage,pathlen:3,policy:foo

Next execute these commands:

mkdir testCA
touch testCA/index.txt
test -f testCA/serial || echo 00 > testCA/serial

# CA
openssl genrsa -out test-ca.key
openssl req -key test-ca.key -days 365 \
    -new -out test-ca.csr -config openssl.cnf \
    -subj "/C=US/ST=California/O=Ning Inc./CN=Reconnoiter Test CA"
openssl x509 -req -in test-ca.csr -signkey test-ca.key \
    -out test-ca.crt

# noit
openssl genrsa -out test-noit.key
openssl req -key test-noit.key -days 365 \
    -new -out test-noit.csr -config openssl.cnf \
    -subj "/C=US/ST=California/O=Ning Inc./CN=noit-test"
openssl ca -batch -config openssl.cnf \
    -in test-noit.csr -out test-noit.crt \
    -outdir . -keyfile test-ca.key -cert test-ca.crt -days 120

# stratcon
openssl genrsa -out test-stratcon.key
openssl req -key test-stratcon.key -days 365 \
    -new -out test-stratcon.csr -config openssl.cnf \
    -subj "/C=US/ST=California/O=Ning Inc./CN=stratcon"
openssl ca -batch -config openssl.cnf \
    -in test-stratcon.csr -out test-stratcon.crt \
    -outdir . -keyfile test-ca.key -cert test-ca.crt -days 120

This will create a bunch of .pem, .crt, .csr, and .key files, that you should copy to /usr/local/etc:

sudo cp *.pem *.crt *.csr *.key /usr/local/etc

6. Setup a noit daemon

Generate the config:

sudo cp src/noit.conf /usr/local/etc/

Now you can edit that file to your heart’s content. Some things to note

  • Comment out/remove sections as necessary, or make sure that they point to existing machines.
  • For every new item, create a new uuid using the uuidgen tool was installed earlier.
  • Update the sslconfig section to use the test certificates:
    <sslconfig>
      <optional_no_ca>false</optional_no_ca>
      <certificate_file>/usr/local/etc/test-noit.crt</certificate_file>
      <key_file>/usr/local/etc/test-noit.key</key_file>
      <ca_chain>/usr/local/etc/test-ca.crt</ca_chain>
    </sslconfig>
    
  • For snmp entries, make sure you have the community set correctly (see https://labs.omniti.com/docs/reconnoiter/ch05s14.html.

Finally start the noit daemon:

sudo /usr/local/sbin/noitd -c /usr/local/etc/noit.conf -D

The -D option is for debugging purposes. It will tell noitd to run in the foreground and log everything to stdout/stderr. You also might want to tweak the logging settings in the configuration file. Turn the debug logging by changing this line near the top of the config file:

<log name="debug" disabled="true"/>

to

<log name="debug"/>

Then switch whichever specific modules you want debug logging for. E.g. for snmp debug logging change this line further down in the config file:

<log name="debug/snmp" disabled="true"/>

to

<log name="debug/snmp"/>

7. Setup a stratcon daemon

Again, create the config file using the sample config file:

sudo cp src/stratcon.conf /usr/local/etc/

Edit as necessary:

  • Logging is configured in the same way as for noit above.
  • Set the password in the database config section to stratcon (or whatever you chose in the scaffolding.sql above).
  • For each noitd instance there needs to be a noitd section.
  • Configure the listeners section, esp. the port (should be an unused one), the hostname and document_domain.
  • Update the sslconfig sections (there is two of them, one in the noits section and one in the listeners section) to use the test certificates:
    <sslconfig>
      <key_file>/usr/local/etc/test-stratcon.key</key_file>
      <certificate_file>/usr/local/etc/test-stratcon.crt</certificate_file>
      <ca_chain>/usr/local/etc/test-ca.crt</ca_chain>
    </sslconfig>
    

Finally start the stratcon daemon:

sudo /usr/local/sbin/stratcond -c /usr/local/etc/stratcon.conf -D

Again, the -D option is for debugging. You can tweak the logging settings in pretty much the same was as for noitd.

8. Verification

In your browser (note that the UI doesn’t quite work in Chrome), go to http://localhost:9090. The reconnoiter UI should appear. On the left side click the + next to “Graph Controls” and then on “Browse Data”. The data that you configured for noitd above should show up, though it might take a few minutes between starting noitd and the first data showing up.

Relevant logs are:

  • /var/log/postgresql/postgresql-8.4-main.log
  • /tmp/rollup.log – the log created by the cron rollup job
  • /var/log/syslog
  • @ROOT@/ui/web/logs/error_log and @ROOT@/ui/web/logs/access_log

Written by tomdzk

November 24, 2009 at 4:35 pm

Posted in computers

Follow

Get every new post delivered to your Inbox.