Storm & Esper
At work, we recently started using Esper for realtime analytics, and so far we quite like Esper. It is a great tool at what it does – running queries continuously over data. The problem however then becomes how to get data into Esper. The recently released Storm could be one way to do that, so I got curios and started playing around with it to see if it could be made to work with Esper. And it turns out, the integration is straightforward.
Some Storm basics
Storm has three basic concepts that are relevant in this context: streams, spouts, and bolts. At the core, Storm facilitates data transfer between spouts and bolts using streams of tuples.
Spouts are the basic data emitters, typically retrieving the data from outside of the Storm cluster. A simple example of this would be a spout that retrieves the tweet stream via the Twitter API and emits the tweets as a stream into the Storm cluster.
Bolts are data processors that receive one or more streams as input and potentially also emit (processed) data on one or more streams. In the twitter example, one could for instance imagine bolts that count the number of tweets per second, or detect the language of the tweet and reemit the tweets into per-language streams.
The data in the streams has a simple tuple form consisting of a fixed number of named values called fields. Storm does not care about the data types of the individual fields in the tuple as long as they can be serialized to the wire format (which is Thrift), whether via serializers provided by Storm or custom ones. Spouts and bolts need to declare the number of fields and their names for each of the tuples they are going to emit as part of the initial setup of the topology. This also means that the number of fields and their names are fixed for the duration of a Storm ‘session’.
Some Eper basics
Esper is, and I’m simplifying things quite a bit here, a processing engine for data streams that uses queries run on the data streams to processes them. Think of it as a way to run SQL-like queries on data that streams by. The queries run continuously and thus have a time or amount-of-data aspect to them. Continuing the twitter example from above, if we consider the never-ending stream of tweets as the data stream that Esper works with, then an Esper query could for instance return the number of tweets per second like so:
select count(*) as tps from Twitter.win:time_batch(1 sec)
The time_batch
part in this example will direct Esper to apply the count
function on 1-sec batches of events.
Esper data streams consist of structured data called events. The types of these events can be POJOs, maps, and other things. Events are typically registered with Esper in advance before submitting a query to Esper. This means that you have to tell Esper about which kind of event type you give it (java class, map, …) and which properties the event type has. For java classes, Esper can figure that out itself but for map events you need to tell Esper explicitly about the possible keys and the value data types. Fortunately, Esper is forgiving when it comes to the data types. You can tell it that you’ll give it Objects, and it will happily accept numbers in the actual data stream and perform numeric operations on them.
How to combine the two
Storm’s tuples are quite similar to Esper’s map event types. The tuple field names map naturally to map keys and the field values to values for these keys. The tuple fields are not typed when they are defined, but that does not pose a big problem for us as we can simply tell Esper that they are of type Object
. In addition, the fact that tuples have to be defined before a topology is run, makes it relatively easy for us to define the map event type in the setup phase.
I am going to use the twitter stream example from the storm-starter project to show how you can use Esper to count the number of tweets per second and also find the maximum number of retweets per 1 second interval. This is probably not of great practical use, but will show off some aspects of the Storm – Esper integration.
An up-to-date version of this code is available on GitHub.
Let’s get started with the twitter spout, a slightly adapted version of the one from the storm-starter project:
public class TwitterSpout implements IRichSpout, StatusListener { private static final long serialVersionUID = 1L; private final String username; private final String pwd; private transient BlockingQueue<Status> queue; private transient SpoutOutputCollector collector; private transient TwitterStream twitterStream; public TwitterSpout(String username, String pwd) { this.username = username; this.pwd = pwd; } @Override public boolean isDistributed() { return false; } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("createdAt", "retweetCount")); } @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { this.queue = new ArrayBlockingQueue<Status>(1000); this.collector = collector; Configuration twitterConf = new ConfigurationBuilder().setUser(username) .setPassword(pwd) .build(); TwitterStreamFactory fact = new TwitterStreamFactory(twitterConf); twitterStream = fact.getInstance(); twitterStream.addListener(this); twitterStream.sample(); } @Override public void onStatus(Status status) { queue.offer(status); } @Override public void nextTuple() { Status value = queue.poll(); if (value == null) { Utils.sleep(50); } else { collector.emit(tuple(value.getCreatedAt().getTime(), value.getRetweetCount())); } } @Override public void close() { twitterStream.shutdown(); } @Override public void ack(Object arg0) {} @Override public void fail(Object arg0) {} @Override public void onException(Exception ex) {} @Override public void onDeletionNotice(StatusDeletionNotice statusDeletionNotice) {} @Override public void onTrackLimitationNotice(int numberOfLimitedStatuses) {} @Override public void onScrubGeo(long userId, long upToStatusId) {} }
This defines a spout that emits a single stream of tuples with two fields, createdAt
(a timestamp) and retweetCount
(an integer).
You’ll notice that aside from the twitter username and password, all fields in the spout are marked as transient, and initialized in the open method
. The reason for this is that Storm requires spouts and bolts to be serializable so it can move them to some node in the Storm cluster before starting the topology.
The Esper bolt itself is generic. You pass it esper statements and the names of the output fields which will be generated by these esper statements. The adapted main
method for our Twitter example looks like this:
public static void main(String[] args) { final String username = args[0]; final String pwd = args[1]; TopologyBuilder builder = new TopologyBuilder(); TwitterSpout spout = new TwitterSpout(username, pwd); EsperBolt bolt = new EsperBolt( new Fields("tps", "maxRetweets"), "select count(*) as tps, max(retweetCount) as maxRetweets from Storm.win:time_batch(1 sec)"); builder.setSpout(1, spout); builder.setBolt(2, bolt).shuffleGrouping(1); Config conf = new Config(); conf.setDebug(true); LocalCluster cluster = new LocalCluster(); cluster.submitTopology("test", conf, builder.createTopology()); Utils.sleep(10000); cluster.shutdown(); }
Note how the Esper statement returns tps
and maxRetweets
which are also declared as the two output fields for the bolt.
The bolt code itself (see a version of this that is kept up-to-date with Storm here) consists of three pieces. The setup part constructs map event types for each input stream and registers them with Esper (I omitted the Esper setup code):
private void setupEventTypes(TopologyContext context, Configuration configuration) { Set<GlobalStreamId> sourceIds = context.getThisSources().keySet(); singleEventType = (sourceIds.size() == 1); for (GlobalStreamId id : sourceIds) { Map<String, Object> props = new LinkedHashMap<String, Object>(); setupEventTypeProperties( context.getComponentOutputFields(id.get_componentId(), id.get_streamId()), props); configuration.addEventType(getEventTypeName(id.get_componentId(), id.get_streamId()), props); } } private String getEventTypeName(int componentId, int streamId) { if (singleEventType) { return "Storm"; } else { return String.format("Storm_%d_%d", componentId, streamId); } } private void setupEventTypeProperties(Fields fields, Map<String, Object> properties){ int numFields = fields.size(); for (int idx = 0; idx < numFields; idx++) { properties.put(fields.get(idx), Object.class); } }
The field-to-property mapping is straightforward. It simply registers properties of type Object
using the field names in the event type corresponding to the input stream. If the bolt only has a single input stream, then it registers a single event type called Storm
. For multiple types, it uses the component id (id of the spout or bolt that the data comes from) and the stream id (spouts and bolts can emit multiple streams) to generate a name Storm_{component id}_{stream id}
.
The second part is the transfer of data from Storm to Esper:
@Override public void execute(Tuple tuple) { String eventType = getEventTypeName(tuple.getSourceComponent(), tuple.getSourceStreamId()); Map<String, Object> data = new HashMap<String, Object>(); Fields fields = tuple.getFields(); int numFields = fields.size(); for (int idx = 0; idx < numFields; idx++) { String name = fields.get(idx); Object value = tuple.getValue(idx); data.put(name, value); } runtime.sendEvent(data, eventType); }
This method is called by Storm whenever a tuple from any of the connected streams is sent to the bolt. The code therefore first has to find the event type name corresponding to the tuple. Then it iterates over the fields in the tuple and puts the values into a map using the field names as the keys. Finally, it passes that map to Esper.
At this moment, Esper will route this map (the event) through the statements which in turn might produce new data that we need to hand back to Storm. For this purpose, the bolt registered itself as a listener for data emitted from any of the statements that we configured during the setup. Esper will then call back the update
method on the bolt if one of the statements generated data. The update
method will then basically perform the reverse operation of the execute
method and convert the event data to a tuple:
@Override public void update(EventBean[] newEvents, EventBean[] oldEvents) { if (newEvents != null) { for (EventBean newEvent : newEvents) { collector.emit(toTuple(newEvent)); } } } private List<Object> toTuple(EventBean event) { int numFields = outputFields.size(); List<Object> tuple = new ArrayList<Object>(numFields); for (int idx = 0; idx < numFields; idx++) { tuple.add(event.get(outputFields.get(idx))); } return tuple; }
Lion tweaks
This is a quick summary of all the tweaks handed down from generation to generation (aka found on the Internet), that I applied to a fresh Lion install to get it into a usable state.
Turn firewall on
Go to System Preferences
-> Security & Privacy
, then click the lock. Now you can start the firewall via the Start
button. Also, click on Advanced...
and then check Enable stealth mode
.
Turn off Autocorrect
Go to System Preferences
> Language & Text
> Text
, then uncheck Correct spelling automatically
.
Trackpad tweaks
Go to System Preferences
-> Trackpad
, then uncheck Scroll direction: natural
.
Terminal tweaks
Start a new terminal, then open the preferences.
Theme
- Select
Pro
, then clickDefault
at the bottom. - On the left, click
Change
next to font and select 14pt font size. - Check
Antialias text
. - Uncheck
Use bold fonts
.
Dimensions
Go to the Window tab
, then enter 120
in the Columns
input field.
Keybindings
Go to the Keyboard
tab. Select the first entry control cursor left
, then click Edit
at the bottom
Delete the last three characters via the Delete one character button
, then add a single b
(the result should be 33b
). Do the same with control cursor right
and f
so that you get 33f
. These two changes give you Control
+Left cursor
/Right cursor
for going to the previous/next word (quite useful for SSH, for instance).
Check Use option as meta key
.
Other stuff
On the Advanced
tab, uncheck Audible bell
and possibly Visual bell
.
Use function keys instead of feature keys
Go to System Preferences
-> Keyboard
, then check Use all F1, F2, etc. keys as standard function keys
.
Re-enable key repeat
In a terminal, enter this:
defaults write -g ApplePressAndHoldEnabled -bool false
Then go to System Preferences
-> Keyboard
, then move the sliders for Key Repeat
and Delay Until Repeat
both to the rightmost setting.
Finally, restart the computer.
Disable the new window animation
In a terminal, enter this:
defaults write NSGlobalDomain NSAutomaticWindowAnimationsEnabled -bool NO
Mission Control tweaks
Go to System Preferences
-> Mission Control
, then uncheck Show Dashboard as a space
and Automatically rearrange spaces based on most recent use
.
Go to System Preferences
-> Keyboard
-> Keyboard Shortcuts
, then select Spotlight
. Double click the key binding for Show Spotlight window
and press Shift
+Command
+Space
.
Now, select Mission Control
. Double click the key binding for Mission Control
and press Option
+Command
+Space
.
Double click the key binding for Move left a space
and then press Option
+Command
+Left cursor
. Similarly, for Move right a space
, use Option
+Command
+Right
cursor.
This will enable Option
–Left cursor
/Right cursor
for jumping between words (in addition to the Control
–Left cursor
/Right cursor
that we set up above).
Internal DNS in Amazon EC2 via tags
The recent EC2 API update from August 31st introduced the concept of tags which allow us to attach somewhat arbitrary metadata to instances and other resources. One useful application that came to my mind is to use them for maintaining internal DNS. The two blog posts here and here describe how to do this using the name of the ssh key that was used to create the instance. However that means that a new ssh key has to be used for each instance which is cumbersome. Tags make this a lot easier, especially since they are automatically returned as part of the instance metadata.
The only problem was that Glenn Rempe’s AWS ruby library doesn’t support the new API version yet. So forked the library and updated it to support the new API version and added the necessary functions.
So assuming that your instances have a tag named “hostname” that contains the desired short hostname (e.g. db1), with this new gem, the script from the above two blog posts becomes:
#!/usr/bin/env ruby %w(optparse rubygems AWS resolv pp).each { |l| require l } options = {} parser = OptionParser.new do |p| p.banner = "Usage: hosts [options]" p.on("-a", "--access-key USER", "The user's AWS access key ID.") do |aki| options[:access_key_id] = aki end p.on("-s", "--secret-key PASSWORD", "The user's AWS secret access key.") do |sak| options[:secret_access_key] = sak end p.on_tail("-h", "--help", "Show this message") { puts(p) exit } p.parse!(ARGV) end if options.key?(:access_key_id) and options.key?(:secret_access_key) puts "127.0.0.1 localhost" AWS::EC2::Base.new(options).describe_instances.reservationSet.item.each do |r| r.instancesSet.item.each do |i| if i.instanceState.name =~ /running/ tagSet = i.tagSet if (!tagSet.nil? && !tagSet.item.nil?) tagSet.item.each do |hash| if hash.key == 'hostname' puts(Resolv::DNS.new.getaddress( i.privateDnsName).to_s +" #{hash.value}.ec2 #{hash.value}") end end end end end end else puts(parser) exit(1) end
P.S.: The documentation for the API can be found here.
How to install Scribe with HDFS support on Ubuntu Karmic
Prerequisites
Install some pre-requisites (more might be needed, my system had a bunch of things already):
sudo apt-get install bison flex sun-java6-jdk ruby1.8-dev ant
Create a build folder
We won’t install scribe or thrift on the machine itself, instead keep it confined to a folder. For this we should
mkdir scribe-build cd scribe-build mkdir dist
The dist folder will contain the binary distribution of scribe once we’re done, including all libraries.
Install Boost
On Ubuntu, you can simply install boost via the package manager:
sudo apt-get install libboost1.40-dev libboost-filesystem1.40-dev
These are the only two parts of boost that are needed. Also, please make sure to get at least version 1.40.
If you want to install from source instead, download boost version 1.40 or newer from http://www.boost.org/ (current version is 1.41.0) and then unpack it into the scribe-build folder. After that, cd to the created folder and build it:
cd boost_1_41_0 ./bootstrap.sh --prefix=`pwd`/../dist ./bjam install cd ..
Install Libevent
Again, libevent can simply be installed via the package manager:
sudo apt-get install libevent-dev
On Karmic, this will install libevent (if not installed already) and libevent development files for version 1.4.11 or newer. If you want to install it from source, download the 1.4.x source distribution from http://www.monkey.org/~provos/libevent/ (1.4.13 is the current version) and unpack it into the scribe-build folder. Then cd into the generated folder and build it:
cd libevent-1.4.13-stable ./configure --prefix=`pwd`/../dist make make install cd ..
Thrift and FB303
Download version 0.2.0-incubating from http://incubator.apache.org/thrift/download and unpack it into scribe-build. This should generate a folder scribe-build/thrift-0.2.0. To build it, run:
cd thrift-0.2.0 export PY_PREFIX=`pwd`/../dist export JAVA_PREFIX=`pwd`/../dist ./configure --prefix=`pwd`/../dist \ --with-boost=`pwd`/../dist \ --with-libevent=`pwd`/../dist make make install cd ..
This will most likely throw an error when trying to setup the ruby binding since it won’t be allowed to write into the system directory. This is due to a bug in the thrift build scripts – there is no way that I could find to tell it to install the ruby bindings locally. However, the things that we want will have been installed successfully, so let’s move on.
Next build the FB303 project:
cd contrib/fb303 export PY_PREFIX=`pwd`/../../../dist ./bootstrap.sh \ --with-thriftpath=`pwd`/../../../dist \ --with-boost=`pwd`/../../../dist \ --prefix=`pwd`/../../../dist make make install cd ../../..
Libhdfs
Scribe currently requires libhdfs 0.20.1 with patches applied – the stock version from the Hadoop 0.20.1 distribution won’t work. You can either use the Cloudera 0.20.1 distribution which has these patches applied, or use a newer version – presumably 0.21 works, but I haven’t tried it.
On Ubuntu, you can either install the Cloudera Hadoop distribution via debian packages, or you can compile it from source. The Debian/Ubuntu setup steps are described here:
http://archive.cloudera.com/docs/_apt.html.
We however are going to compile libhdfs from source to get an independent library. Download from
http://archive.cloudera.com/cdh/testing/hadoop-0.20.1+152.tar.gz
and unpack it into the scribe-build folder. This will create a hadoop-0.20.1+152 folder, so let’s go there:
cd hadoop-0.20.1+152
Unfortunately, we also need to tweak two files by adding this line
#include <stdint.h>
right under the existing
#include <stdint.h>
in these two files
src/c++/utils/api/hadoop/SerialUtils.hh src/c++/pipes/api/hadoop/Pipes.hh
Once you’ve done that, run:
cd src/c++/libhdfs ./configure --enable-shared \ JVM_ARCH=tune=k8 \ --prefix=`pwd`/../../../../dist make make install cd ../../../..
Note that this seems to have been fixed in the 0.20.1+168.89 cloudera release.
Build scribe
Download scribe 2.1 from http://github.com/facebook/scribe/downloads or clone the git repository (git://github.com/facebook/scribe.git). If you download the distribution, unpack it into the scribe-build directory, yielding a scribe-build/scribe- folder. cd to the scribe folder and the run:
cd scribe-2.1 export LD_LIBRARY_PATH="`pwd`/../dist/lib:"\ "/usr/lib/jvm/java-6-sun/jre/lib/amd64:"\ "/usr/lib/jvm/java-6-sun/jre/lib/amd64/server" export CFLAGS="-I/usr/lib/jvm/java-6-sun/include/ "\ "-I/usr/lib/jvm/java-6-sun/include/linux/" export LDFLAGS="-L`pwd`/../dist/lib "\ "-L/usr/lib/jvm/java-6-sun/jre/lib/amd64 "\ "-L/usr/lib/jvm/java-6-sun/jre/lib/amd64/server" export LIBS="-lhdfs -ljvm" ./bootstrap.sh --enable-hdfs \ --with-hadooppath=`pwd`/../dist \ --with-boost=`pwd`/../dist \ --with-thriftpath=`pwd`/../dist \ --with-fb303path=`pwd`/../dist \ --prefix=`pwd`/../dist make make install cd ..
Adjust the jre/lib paths in the LDFLAGS to match your environment (e.g. 32bit vs. 64bit). The HDFS/Hadoop path in there is optional (i.e. enabled via the –enable-hdfs option) and only required if you want hdfs support.
Test that it works
Simply start scribe with the library path set correctly:
cd dist export LD_LIBRARY_PATH="`pwd`/lib" ./bin/scribed ../scribe-2.1/examples/example1.conf
This should generate output like this:
[Tue Jan 19 00:31:07 2010] "STATUS: STARTING" [Tue Jan 19 00:31:07 2010] "STATUS: configuring" [Tue Jan 19 00:31:07 2010] "got configuration data from file " [Tue Jan 19 00:31:07 2010] "CATEGORY : default" [Tue Jan 19 00:31:07 2010] "Creating default store" [Tue Jan 19 00:31:07 2010] "configured stores" [Tue Jan 19 00:31:07 2010] "STATUS: " [Tue Jan 19 00:31:07 2010] "STATUS: ALIVE" [Tue Jan 19 00:31:07 2010] "Starting scribe server on port 1463"
Setting up reconnoiter on Ubuntu (Karmic and newer)
After it took me about 2 days to figure out how to setup reconnoiter, I figured, it would be nice to document the steps so that it will be much easier for other people.
Note: This guide was written for Karmic Koala (9.10) and Lucid Lynx (10.04). It should generally work for Jaunty, too, as well as other Linux distributions (minus the package manager instructions obviously).
Note: This guide has been updated to reconnoiter trunk revision 1404.
Before we begin, here are some useful links:
Reconnoiter home page: https://labs.omniti.com/trac/reconnoiter
Reconnoiter docs: http://labs.omniti.com/docs/reconnoiter/
Oscon demo: http://omniti.com/video/noit-oscon-demo
1. Build it
First, let’s install a bunch of things. In the Synaptic Package Manager under Settings -> Repositories -> Other Software
enable the two entries for the partner repositories. Then
sudo apt-get install autoconf build-essential libtool gettext \ zlib1g-dev uuid-dev libpcre3-dev libssl-dev libpq-dev \ libxml2-dev libxslt-dev libapr1-dev libaprutil1-dev xsltproc \ libncurses5-dev libssh2-1-dev libsnmp-dev libmysqlclient-dev \ subversion sun-java6-jdk
Now we check out reconnoiter from subversion and build it:
svn co https://labs.omniti.com/reconnoiter/trunk reconnoiter cd reconnoiter autoconf ./configure make sudo mkdir -p /usr/local/java/libmake sudo make install
2. Setup the DB
We need PostgreSQL 8.4 server & client. On Karmic you get that via
sudo apt-get install postgresql postgresql-client
For Jaunty, follow the steps here.
Next, make sure that the postgresql config file allows local access without password. Edit the /etc/postgresql/8.4/main/pg_hba.conf to change the local entry to use “trust”:
local all all trust
After that, restart the postgresql server:
sudo /etc/init.d/postgresql-8.4 restart
Now log in into postgresql:
sudo su postgres cd sql psql
Within psql do
\i scaffolding.sql \q
3. Setup cron
First, we need to change the crontab to point to where postgresql is actually installed:
exit sed -i 's/\/opt\/psql835/usr/g' sql/crontab sudo su postgres cd sql
We also need to run the commands in the crontab at least once manually as they will initialize certain database structures. As the postgres user:
eval "`cat crontab | cut -d' ' -f6- | grep -v ^$ | awk '{print $0\";\"}'`"
Finally, and still as user postgres do
crontab crontab exit
4. Setup the web ui
For configuring the web UI (PHP), we first need Apache2 and PHP:
sudo apt-get install apache2 libapache2-mod-php5 php5-pgsql
This will also enable mod_php5. Every other required module (mod_mime, mod_lib_config, mod_rewrite, mod_proxy, mod_proxy_http, mod_authz_host) should be already enabled or even compiled in the server (apache2 -l will show). To make sure that they are enabled, simply do
sudo a2enmod mime sudo a2enmod rewrite sudo a2enmod proxy sudo a2enmod proxy_http sudo a2enmod authz_host
Next, we need the apache configuration, either as a a new file /etc/apache2/sites-available/reconnoiter that then should be symlinked into /etc/apache2/sites-enabled, or in the current configuration (e.g. /etc/apache2/sites-enabled/000-default). A sample configuration to setup reconnoiter on port 80:
<VirtualHost *:80> ServerAdmin webmaster@localhost DocumentRoot @ROOT@/ui/web/htdocs <Directory "/"> Options None AllowOverride None Order allow,deny Deny from all </Directory> <FilesMatch "^\.ht"> Order allow,deny Deny from all Satisfy All </FilesMatch> <Directory "@ROOT@/ui/web/htdocs/"> php_value include_path @ROOT@/ui/web/lib php_value short_open_tag off Options FollowSymLinks Indexes AllowOverride All Order deny,allow Allow from all </Directory> LogLevel warn LogFormat "%h %l %u %t \"%r\" %>s %b" common ErrorLog @ROOT@/ui/web/logs/error_log CustomLog @ROOT@/ui/web/logs/access_log common AddType application/x-compress .Z AddType application/x-gzip .gz .tgz AddType application/x-httpd-php .php DefaultType text/plain </VirtualHost>
Replace @ROOT@ with the directory where you have installed reconnoiter.
If you chose to add reconnoiter to the Apache config on a different port than 80, say 9090, then you will also have to change Apache’s port configuration in /etc/apache2/ports.conf by adding:
NameVirtualHost *:9090 Listen 9090
Then restart apache:
sudo /etc/init.d/apache2 restart
5. Generate test certificates
These steps show how to generate test certificates. In a production environment you would of course use a real CA.
Create/go to a temporary directory:
mkdir ssh-keys cd ssh-keys
Next create a file openssl.cnf file in it with this contents:
HOME = . RANDFILE = $ENV::HOME/.rnd oid_section = new_oids [ new_oids ] [ ca ] default_ca = CA_default [ CA_default ] dir = ./testCA certs = $dir/certs crl_dir = $dir/crl database = $dir/index.txt new_certs_dir = $dir/newcerts certificate = $dir/cacert.pem serial = $dir/serial crl = $dir/crl.pem private_key = $dir/private/cakey.pem RANDFILE = $dir/private/.rand x509_extensions = usr_cert name_opt = ca_default cert_opt = ca_default default_days = 365 default_crl_days = 30 default_md = md5 preserve = no policy = policy_match [ policy_match ] countryName = match stateOrProvinceName = match organizationName = match organizationalUnitName = optional commonName = supplied emailAddress = optional [ policy_anything ] countryName = optional stateOrProvinceName = optional localityName = optional organizationName = optional organizationalUnitName = optional commonName = supplied emailAddress = optional [ req ] default_bits = 1024 default_keyfile = privkey.pem distinguished_name = req_distinguished_name attributes = req_attributes x509_extensions = v3_ca string_mask = nombstr [ req_distinguished_name ] countryName = Country Name (2 letter code) countryName_default = AU countryName_min = 2 countryName_max = 2 stateOrProvinceName = State or Province Name (full name) stateOrProvinceName_default = Some-State localityName = Locality Name (eg, city) 0.organizationName = Organization Name (eg, company) 0.organizationName_default = Internet Widgits Pty Ltd organizationalUnitName = Organizational Unit Name (eg, section) commonName = Common Name (eg, YOUR name) commonName_max = 64 emailAddress = Email Address emailAddress_max = 64 [ req_attributes ] challengePassword = A challenge password challengePassword_min = 4 challengePassword_max = 20 unstructuredName = An optional company name [ usr_cert ] basicConstraints = CA:FALSE nsComment = "OpenSSL Generated Certificate" subjectKeyIdentifier = hash authorityKeyIdentifier = keyid,issuer:always [ v3_req ] basicConstraints = CA:FALSE keyUsage = nonRepudiation, digitalSignature, keyEncipherment [ v3_ca ] subjectKeyIdentifier = hash authorityKeyIdentifier = keyid:always,issuer:always basicConstraints = CA:true [ crl_ext ] authorityKeyIdentifier = keyid:always,issuer:always [ proxy_cert_ext ] basicConstraints = CA:FALSE nsComment = "OpenSSL Generated Certificate" subjectKeyIdentifier = hash authorityKeyIdentifier = keyid,issuer:always proxyCertInfo = critical,language:id-ppl-anyLanguage,pathlen:3,policy:foo
Next execute these commands:
mkdir testCA touch testCA/index.txt test -f testCA/serial || echo 00 > testCA/serial # CA openssl genrsa -out test-ca.key openssl req -key test-ca.key -days 365 \ -new -out test-ca.csr -config openssl.cnf \ -subj "/C=US/ST=California/O=Ning Inc./CN=Reconnoiter Test CA" openssl x509 -req -in test-ca.csr -signkey test-ca.key \ -out test-ca.crt # noit openssl genrsa -out test-noit.key openssl req -key test-noit.key -days 365 \ -new -out test-noit.csr -config openssl.cnf \ -subj "/C=US/ST=California/O=Ning Inc./CN=noit-test" openssl ca -batch -config openssl.cnf \ -in test-noit.csr -out test-noit.crt \ -outdir . -keyfile test-ca.key -cert test-ca.crt -days 120 # stratcon openssl genrsa -out test-stratcon.key openssl req -key test-stratcon.key -days 365 \ -new -out test-stratcon.csr -config openssl.cnf \ -subj "/C=US/ST=California/O=Ning Inc./CN=stratcon" openssl ca -batch -config openssl.cnf \ -in test-stratcon.csr -out test-stratcon.crt \ -outdir . -keyfile test-ca.key -cert test-ca.crt -days 120
This will create a bunch of .pem, .crt, .csr, and .key files, that you should copy to /usr/local/etc:
sudo cp *.pem *.crt *.csr *.key /usr/local/etc
6. Setup a noit daemon
Generate the config:
sudo cp src/noit.conf /usr/local/etc/
Now you can edit that file to your heart’s content. Some things to note
- Comment out/remove sections as necessary, or make sure that they point to existing machines.
- For every new item, create a new uuid using the uuidgen tool was installed earlier.
- Update the sslconfig section to use the test certificates:
<sslconfig> <optional_no_ca>false</optional_no_ca> <certificate_file>/usr/local/etc/test-noit.crt</certificate_file> <key_file>/usr/local/etc/test-noit.key</key_file> <ca_chain>/usr/local/etc/test-ca.crt</ca_chain> </sslconfig>
- For snmp entries, make sure you have the community set correctly (see https://labs.omniti.com/docs/reconnoiter/ch05s14.html.
Finally start the noit daemon:
sudo /usr/local/sbin/noitd -c /usr/local/etc/noit.conf -D
The -D option is for debugging purposes. It will tell noitd to run in the foreground and log everything to stdout/stderr. You also might want to tweak the logging settings in the configuration file. Turn the debug logging by changing this line near the top of the config file:
<log name="debug" disabled="true"/>
to
<log name="debug"/>
Then switch whichever specific modules you want debug logging for. E.g. for snmp debug logging change this line further down in the config file:
<log name="debug/snmp" disabled="true"/>
to
<log name="debug/snmp"/>
7. Setup a stratcon daemon
Again, create the config file using the sample config file:
sudo cp src/stratcon.conf /usr/local/etc/
Edit as necessary:
- Logging is configured in the same way as for noit above.
- Set the password in the database config section to stratcon (or whatever you chose in the scaffolding.sql above).
- For each noitd instance there needs to be a noitd section.
- Configure the listeners section, esp. the port (should be an unused one), the hostname and document_domain.
- Update the sslconfig sections (there is two of them, one in the noits section and one in the listeners section) to use the test certificates:
<sslconfig> <key_file>/usr/local/etc/test-stratcon.key</key_file> <certificate_file>/usr/local/etc/test-stratcon.crt</certificate_file> <ca_chain>/usr/local/etc/test-ca.crt</ca_chain> </sslconfig>
Finally start the stratcon daemon:
sudo /usr/local/sbin/stratcond -c /usr/local/etc/stratcon.conf -D
Again, the -D option is for debugging. You can tweak the logging settings in pretty much the same was as for noitd.
8. Verification
In your browser (note that the UI doesn’t quite work in Chrome), go to http://localhost:9090. The reconnoiter UI should appear. On the left side click the + next to “Graph Controls” and then on “Browse Data”. The data that you configured for noitd above should show up, though it might take a few minutes between starting noitd and the first data showing up.
Relevant logs are:
- /var/log/postgresql/postgresql-8.4-main.log
- /tmp/rollup.log – the log created by the cron rollup job
- /var/log/syslog
- @ROOT@/ui/web/logs/error_log and @ROOT@/ui/web/logs/access_log