what is yrtg? |
yrtg is a patch that contains Yahoo!'s modifications to the
nifty rtg poller. the
requirements for this project arose from our need to support snmp
polling a variety of datacenter networks with diverse sizes. rtg is
lightweight enough to use in our small installations and powerful
enough to poll our large datacenters (even with small poll
intervals). most of the modifications were done to fix scaling
issues we ran into along the way.
yrtg is still a work in progress but it is currently deployed in
production polling hundreds of thousands of targets on thousands of
devices. |
what is yrtg not? |
yrtg is not officially supported. yrtg definitely is
not a Yahoo! product. yrtg is not versioned much beyond
the timestamps on the patches. yrtg is not guaranteed in any
way. yrtg is not a standalone product. |
so i would be running what Yahoo! runs?
|
more or less.
active development is done on an internal branch and then those
changes are integrated (using perforce's baseless merge) into a
public branch after completion. additionally, the private version
has some uninteresting changes to some compile time defaults done
for packaging and tuning. |
what value does yrtg add to rtg? |
-
pERFORMANCE:
- "sqlbufs": sql statements are buffered per-table and
multiple rows are inserted in bulk instead of per-target these
buffers are only flushed when eiher the poll interval has ended or
when the per-packet limit (dynamically obtained from mysql) is
reached. mysqld disk access and cpu usage is significantly reduced.
(N.B.: mysql only).
- per-thread mysql connections have been removed. this allows for
more poller threads without consuming excessive resources on the
mysql server.
- INSERT DELAYED is used to spend less time blocked in a
mysql_query() call which can prevent the next poll interval from
starting (and the snowball effect which results from that).
- per-device snmp sessions are created once when hosts are read
in instead of each time a target is polled.
- FNV hash is used for even distribution of targets across the
hash buckets. if needed in the future, a 1:1 bucket:thread ratio
could be used which would reduce lock contention on threads
acquiring a target.
- the target hash size is now a runtime tunable (-s [buckets])
instead of a compile time #define.
- a plumber thread runs to flush sqlbufs to the database in the
background. buffers can hold two rounds of data for as many targets
that point at the table. buffers that contain an entire poll round
of data (or more) are flushed (or in the case of a downed db,
resized) synchronously at the end of a poll round. buffers with
less than a round of data are flushed in the background. (N.B.:
mysql only).
- snmp OIDs strings compiled for the snmp library once on target
insertion instead of each time a target is polled.
-
sTABILITY:
- targets that report a hard failure condition or those that
timeout twice in a row are removed to prevent them from stalling
the poll round.
- only one snmp query will be sent to a host at the same time.
this avoids snmp timeouts and stressing of target's cpu cycles.
increasing the thread count is now a bit less hazardous to target
hosts. exception: if the same host is configured more than once in
targets.cfg (possibly to use a different snmp version or community).
- if the database server is down at the time of data insertion,
rtgpoll will buffer the inserts until the connection comes back. it
will retry once per poll interval. it will try indefinitely until
the poller process runs out of memory at which point it frees the
buffers. (N.B.: mysql only).
-
fEATURES:
- snmp port number is configurable per-host instead of as a
global value.
- snmp query timeout length and retry counts can now be changed
from the net-snmp default and are configurable per-host.
- mysql's tcp port (DB_Port) and unix domain socket (DB_Socket)
are configurable in rtg.conf.
- cODE
& sTYLE
cLEANUP:
- various buglets (example: predicates which evaluated to
impossible or guaranteed conditions) have been squashed and other
minor nits have been picked.
- linked list macros were stolen from FreeBSD's <sys/queue.h> and
have been used to replace several handrolled structures.
- all freshly written and/or standalone code (rtgsqlbuf.c,
rtghash.c) uses a style resembling KNF from FreeBSD's
style(9) guide.
for changes to original rtg code, existing style patterns were
preserved when it was possible to ascertain them.
|
are other features being planned? |
i no longer am working on yrtg.
- pre-allocate snmp pdus
- coalesce pdus going to the same device
- insert internal poll statistics into a database table
per-round
- add support for polling snmp tables with a single clause in the
targets file. could be configured using the root oid of the table
and an index:rtgid map that could be shared between tables
referencing the same index (like IF-MIB/ifXTable).
- error/warning/syslog/fprintf() consistancy
- state cleanup to the targets.cfg lexer
- pool of database handles of a configurable count
- support for connecting to multiple database servers
|
how do i download and install yrtg? |
after downloading yrtg.patch, apply it
by executing:
$ cd /path/to/rtg && patch < /path/to/yrtg.patch
following that, run 'autoreconf' (or 'automake' and 'autoconf') in
the top level directory. compilation, installation and usage are
the same as the standard rtg once the patch is applied and
'autoreconf' run. |
why is the patch so huge? |
the yrtg changes touch every part of rtgpoll. the two largest
changes in the patch come from the target hash code being
completely rewritten and adding an sql buffering system. in
addition, a lot of code cleanup was done in the
process of all the changes. |
do you have a patch just for feature
X? |
nope. at one point i was keeping individual branches for each
major project, but i now just maintain one public branch used to
generate the patch. i have several reasons for doing this: avoiding
overly complex patch dependencies, spending less time repeatedly
merging in both directions, and lack of access to the upstream
cvs. |
what if the patch doesn't apply
cleanly? |
make sure your rtg directory is from a recent cvs (not a
packaged release). if the patch fails with an up-to-date cvs tree,
please report it to me as a bug.
check out the source code circa 2005, which was when the patch was generated.
|
who do i report a bug or crash to? |
if possible, it would simplify debugging to determine if the
bug also exists in a clean rtg installation without the patch. if
so, report details of the bug to the rtg mailing list. if you
discover that the bugs are yrtg-specific please send me mail and i'll fix
it.
when reporting a bug (to either the list or to me), please provide
lots of juicy details (such as configure/compile logs,
stdout/stderr, syslog'd messages, targets.cfg, rtg.conf file). if
reporting a crash, it would be helpful to recompile with debug
symbols and send
gdb backtraces. |
what about feature requests? |
in addition to the rtg mailing list, you can always drop me a line with new ideas.
no promises that i can write every requested feature, but
suggestions are more than welcome. also, i only use and write code
for rtgpoll, not rtgplot. |
why isn't there postgresql support for sqlbuf?
|
nothing at all against pgsql, but i don't use it. sorry. the
sqlbuf code is fairly clean and adding support for pgsql shouldn't
actually be much work. an ambitious hacker who was interested in
writing support for this would only need to read rtgsqlbuf.c, find
and eradicate any mysqlisms in the generic functions, and write two
database dependent functions.
want to take it on?
write one function (see: sqlbuf_mysql_cfg()) that calculates the
size of: the largest allowed query, the initial preamble to each
insert query per-table, the maximum size of a "values" string
appended per-table. the other function (see: sqlbuf_mysql_flush())
is the code needed to flush the sqlbuf out to the database and
handle any errors that could arise. |