Firewall Ruleset Optimization


Ideally, the operation of a packet filter should not affect legitimate network traffic. Packets violating the filtering policy should be blocked, and compliant packets should pass the device as if the device wasn't there at all.

In reality, several factors limit how well a packet filter can achieve that goal. Packets have to pass through the device, adding some amount of latency between the time a packet is received and the time it is forwarded. Any device can only process a finite amount of packets per second. When packets arrive at a higher rate than the device can forward them, packets are lost.

Most protocols, like TCP, deal well with added latency. You can achieve high TCP transfer rates even over links that have several hundred milliseconds of latency. On the other hand, in interactive network gaming even a few tens of milliseconds are usually perceived as too much. Packet loss is generally a worse problem, TCP performance will seriously deteriorate when a significant number of packets are lost.

This article explains how to identify when pf is becoming the limiting factor in network throughput and what can be done to improve performance in this case.

The significance of packet rate

One commonly used unit to compare network performance is throughput in bytes per second. But this unit is completely inadequate to measure pf performance. The real limiting factor isn't throughput but packet rate, that is the number of packets per second the host can process. The same host that handles 100Mbps of 1500 byte packets without breaking a sweat can be brought to its knees by a mere 10Mbps of 40 byte packets. The former amounts to only 8,000 packets/second, but the latter traffic stream amounts to 32,000 packets/second, which causes roughly four times the amount of work for the host.

To understand this, let's look at how packets actually pass through the host. Packets are received from the wire by the network interface card (NIC) and read into a small memory buffer on the NIC. When that buffer is full, the NIC triggers a hardware interrupt, causing the NIC driver to copy the packets into network memory buffers (mbufs) in kernel memory. The packets are then passed through the TCP/IP stack in form of these mbufs. Once a packet is transferred into an mbuf, most operations the TCP/IP stack performs on the packet are not dependant on the packet size, as these operations only inspect the packet headers and not the payload. This is also true for pf, which gets passed one packet at a time and makes the decision of whether to block it or pass it on. If the packet needs forwarding, the TCP/IP stack will pass it to a NIC driver, which will extract the packet from the mbuf and put it back onto the wire.

Most of these operations have a comparatively high cost per packet, but a very low cost per size of the packet. Hence, processing a large packet is only slightly more expensive than processing a small packet.

Some limits are based on hardware and software outside of pf. For instance, i386-class machines are not able to handle much more than 10,000 interrupts per second, no matter how fast the CPU is, due to architectural constraints. Some network interface cards will generate one interrupt for each packet received. Hence, the host will start to lose packets when the packet rate exceeds around 10,000 packets per second. Other NICs, like more expensive gigabit cards, have larger built-in memory buffers that allow them to bundle several packets into one interrupt. Hence, the choice of hardware can impose some limits that no optimization of pf can surpass.

When pf is the bottleneck

The kernel passes packets to pf sequentially, one after the other. While pf is being called to decide the fate of one packet, the flow of packets through the kernel is briefly suspended. During that short period of time, further packets read off the wire by NICs have to fit into memory buffers. If pf evaluations take too long, packets will quickly fill up the buffers, and further packets will be lost. The goal of optimizing pf rulesets is to reduce the amount of time pf spends for each packet.

An interesting exercise is to intentionally push the host into this overloaded state by loading a very large ruleset like this:

  $ i=0; while [ $i -lt 100 ]; do \
      printf "block from any to %d.%d.%d.%d\n" \
        `jot -r -s " " 4 1 255`; \
      let i=i+1; \
    done | pfctl -vf -

  block drop inet from any to
  block drop inet from any to
  block drop inet from any to

This represents a worst-case ruleset that defies all automatic optimizations. Because each rule contains a different random non-matching address, pf is forced to traverse the entire ruleset and evaluate each rule for every packet. Loading a ruleset that solely consists of thousands of such rules, and then generating a steady flow of packets that must be filtered, inflicts noticeable load on even the fastest machine. While the host is under load, check the interrupt rate with:

  $ vmstat -i

And watch CPU states with:

  $ top

This will give you an idea of how the host reacts to overloading, and will help you spot similar symptoms when using your own ruleset. You can use the same tools to verify effects of optimizations later on.

Then try the other extreme. Completely disable pf like:

  $ pfctl -d

Then compare the vmstat and top values.

This is a simple way to get a rough estimate and upper limit on what to realistically expect from optimization. If your host handles your traffic with pf disabled, you can aim to achieve similar performance with pf enabled. However, if the host already shows problems handling the traffic with pf disabled, optimizing pf rulesets is probably pointless, and other components should be changed first.

If you already have a working ruleset and are wondering whether you should spend time on optimizing it for speed, repeat this test with your ruleset and compare the results with both extreme cases. If running your ruleset shows effects of overloading, you can use the guidelines below to reduce those effects.

In some cases, the ruleset shows no significant amount of load on the host, yet connections through the host show unexpected problems, like delays during connection establishment, stalling connections or disappointingly low throughput. In most of these cases, the problem is not filtering performance at all, but a misconfiguration of the ruleset which causes packets to get dropped. See Testing Your Firewall about how to identify and deal with such problems.

And finally, if your ruleset is evaluated without causing significant load and everything works as expected, the most reasonable conclusion is to leave the ruleset as is is. Often, rulesets written in a straight-forward approach without respect for performance are evaluated efficiently enough to cause no packet loss. Manual optimizations will only make the ruleset harder to read for the human maintainer, while having only insignificant effect on performance.

Filter statefully

The amount of work done by pf mainly consists of two kinds of operations: ruleset evaluations and state table lookups.

For every packet, pf first does a state table lookup. If a matching state entry is found in the state table, the packet is immediately passed. Otherwise pf evaluates the filter ruleset to find the last matching rule for the packet which decides whether to block or pass it. If the rule passes the packet, it can optionally create a state entry using the 'keep state' option.

When filtering statelessly, without using 'keep state' to create state entries for connections, every packet causes an evaluation of the ruleset, and ruleset evaluation is the single most costly operation pf performs in this scenario. Each packet still causes a state table lookup, but since the table is empty, the cost of the lookup is basically zero.

Filtering statefully means using 'keep state' in filter rules, so packets matching those rules will create a state table entry. Further packets related to the same connections will match the state table entries and get passed automatically, without evaluations of the ruleset. In this scenario, only the first packet of each connection causes a ruleset evaluation, and subsequent packets only cause a state lookup.

Now, a state lookup is much cheaper than a ruleset evaluation. A ruleset is basically a list of rules which must be evaluated from top to bottom. The cost increases with every rule in the list, twice as many rules mean twice the amount of work. And evaluating a single rule can cause comparison of numerous values in the packet. The state table, on the other hand, is a tree. The cost of lookup increases only logarithmically with the number of entries, twice as many states mean only one additional comparison, a fraction of additional work. And comparison is needed only for a limited number of values in the packet.

There is some cost to creating and removing state entries. But assuming the state will match several subsequent packets and saves ruleset evaluation for them, the sum is much cheaper. For specific connections like DNS lookups, where each connection only consists of two packets (one request and one reply), the overhead of state creation might be worse than two ruleset evaluations. Connections that consist of more than a handful of packets, like most TCP connections, will benefit from the created state entry.

In short, you can make ruleset evaluation a per-connection cost instead of a per-packet cost. This can easily make a factor of 100 or more. For example, I see the following counters when I run:

  $ pfctl -si

  State Table                          Total             Rate
    searches                       172507978          887.4/s
    inserts                          1099936            5.7/s
    removals                         1099897            5.7/s
    match                            6786911           34.9/s

This means pf gets called about 900 times per second. I'm filtering on multiple interfaces, so that would mean I'm forwarding about 450 packets per second, each of which gets filtered twice, once on each interface it passes through. But ruleset evaluation occurs only about 35 times per second, and state insertions and deletions only 6 times per second. With anything but a tiny ruleset, this is very well worth it.

To make sure that you're really creating state for each connection, search for 'pass' rules which don't use 'keep state', like in:

  $ pfctl -sr | grep pass | grep -v 'keep state'

Make sure you have a tight 'block by default' policy, as otherwise packets might pass not only due to explicit 'pass' rules, but mismatch all rules and pass by default.

The downside of stateful filtering

The only downside to stateful filtering is that state table entries need memory, around 256 bytes for each entry. When pf fails to allocate memory for a new state entry, it blocks the packet that should have created the state entry instead, and increases an out-of-memory counter shown by:

  $ pfctl -si
    memory                                 0            0.0/s

Memory for state entries is allocated from the kernel memory pool called 'pfstatepl'. You can use vmstat(8) to show various aspects of pool memory usage:

  $ vmstat -m
  Memory resource pool statistics
  Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
  pfstatepl    256  1105099    0  1105062   183   114    69   127     0 625   62

The difference between 'Requests' and 'Releases' equals the number of currently allocated state table entries, which should match the counter shown by:

  $ pfctl -si
  State Table                          Total             Rate
    current entries                       36

Other counters shown by pfctl can get reset by pfctl -Fi.

Not all memory of the host is available to the kernel, and the way the amount of physical RAM affects the amount available to the kernel depends on architecture and kernel options and version. As of OpenBSD 3.6, an i386 kernel can use up to 256MB of memory. Prior to 3.6, that limit was much lower for i386. You could have 8GB of RAM in your host, and still pf would fail to allocate memory beyond a small fraction of that amount.

To make matters worse, when pf really hits the limit where pool_get(9) fails, the failure is not as graceful as one might wish. Instead, the entire system becomes unstable after that point, and eventually crashes. This really isn't pf's fault, but a general problem with kernel pool memory management.

To address this, pf itself limits the number of state entries it will allocate at the same time, using pool_sethardlimit(9), also shown by vmstat -m output. The default for this limit is 10,000 entries, which is safe for any host. The limit can be printed with:

  $ pfctl -sm
  states     hard limit  10000
  src-nodes  hard limit  10000
  frags      hard limit    500

If you need more concurrent state entries, you can increase the limit in pf.conf with:

  set limit states 10000

The problem is determining a large value that is still safe enough not to trigger a pool allocation failure. This is still a sore topic, as there is no simple formula to calculate the value. Basically, you have to increase the limit and verify the host remains stable after reaching that limit, by artificially creating as many entries.

On the bright side, if you have 512MB or more of RAM, you can now use 256MB for the kernel, which should be safe for at least 500,000 state entries. And most people consider that a lot of concurrent connections. Just imagine each of those connections generating just one packet every ten seconds, and you end up with a packet rate of 50,000 packets/s.

More likely, you don't expect that many states at all. But whatever your state limit is, there are cases where it will be reached, like during a denial-of-service attack. Remember, pf will fail closed not open when state creation fails. An attacker could create state entries until the limit is reached, just for the purpose of denying service to legitimate users.

There are several ways to deal with this problem.

You can limit the number of states created from specific rules, for instance like:

  pass in from any to $ext_if port www keep state (max 256)

This would limit the number of concurrent connections to the web server to 256, while other rules could still create state entries. Similarly, the maximum number of connections per source address can be restricted with:

  pass keep state (source-track rule, max-src-states 16)

Once a state entry is created, various timeouts define when it is removed. For instance:

  $ pfctl -st
  tcp.opening                  30s

The timeout for TCP states that are not yet fully established is set to 30 seconds. These timeouts can be lowered to remove state entries more aggressively. Individual timeout values can be set globally in pf.conf:

 set timeout tcp.opening 20

They can also be set in individual rules, applying only to states created by these rules:

  pass keep state (tcp.opening 10)

There are several pre-defined sets of global timeouts which can be selected in pf.conf:

  set optimization aggressive

Also, there's adaptive timeouts, which means these timeouts are not constants, but variables which adjust to the number of state entries allocated. For instance:

  set timeout { adaptive.start 6000, adaptive.end 12000 }

pf will use constant timeout values as long as there are less than 6,000 state entries. When there are between 6,000 and 12,000 entries, all timeout values are linearly scaled from 100% at 6,000 to 0% at 12,000 entries, i.e. with 9,000 entries all timeout values are reduced to 50%.

In summary, you probably can specify a number of maximum states you expect to support. Set this as the limit for pf. Expect the limit to get reached during certain attacks, and define a timeout strategy for this case. In the worst case, pf will drop packets when state insertion fails, and the out-of-memory counter will increase.

Ruleset evaluation

A ruleset is a linear list of individual rules, which are evaluated from top to bottom for a given packet. Each rule either does or does not match the packet, depending on the criteria in the rule and the corresponding values in the packet.

Therefore, to a first approximation, the cost of ruleset evaluation grows with the number of rules in the ruleset. This is not precisely true for reasons we'll get into soon, but the general concept is correct. A ruleset with 10,000 rules will almost certainly cause much more load on your host than one with just 100 rules. The most obvious optimization is to reduce the number of rules.

Ordering rulesets to maximize skip steps The first reason why ruleset evaluation can be cheaper than evaluating each individual rule in the ruleset is called skip steps. This is a transparent and automatic optimization done by pf when the ruleset is loaded. It's best explained with an example. Imagine you have the following simple ruleset:

  1. block in all
  2. pass in on fxp0 proto tcp from any to port 22 keep state
  3. pass in on fxp0 proto tcp from any to port 25 keep state
  4. pass in on fxp0 proto tcp from any to port 80 keep state
  5. pass in on fxp0 proto tcp from any to port 80 keep state

A TCP packet arrives in on fxp0 to destination address on some port.

pf will start the ruleset evaluation for this packet with the first rule, which fully matches. Evaluation continues with the second rule, which matches the criteria 'in', 'on fxp0', 'proto tcp', 'from any' but doesn't match 'to'. So the rule does not match, and evaluation should continue with the third rule.

But pf is aware that the third and fourth rule also specify the same criterion 'to' which caused the second rule to mismatch. Hence, it is absolutely certain that the third and fourth rule cannot possibly match this packet, either, and immediately jumps to the fifth rule, saving several comparisons.

Imagine the packet under inspection was UDP instead of TCP. The first rule would have matched, evaluation would have continued with the second rule. There, the criterion 'proto tcp' would have made the rule mismatch the packet. Since the subsequent rules also specify the same criterion 'proto tcp' which was found to mismatch the packet, all of them could be safely skipped, without affecting the outcome of the evaluation.

Here's how pf analyzes your ruleset when you load it. Each rule can contain a list of criteria like 'to', restricting the rule to match packets with that destination address. For each criteria in each rule, pf counts the number of rules immediately below that rule which specify the exact same criterion. This can be zero, when the next rule does not use the exact sime criterion. Or it can be any number up to the number of remaining rules, when they all specify it. The counted numbers are stored in memory for later use. They're called skip steps because they tell pf how many subsequent steps (rules) can be skipped when any criteria in any rule is found to not match the packet being inspected.

Rule evaluation compares the criteria in the rule against the values in the packet in a fixed order:

  1. interface ('on fxp0')
  2. direction ('in', 'out')
  3. address family ('inet' or 'inet6')
  4. protocol ('proto tcp')
  5. source address ('from')
  6. source port ('from port < 1024')
  7. destination address ('to')
  8. destination port ('to port 80')

If the rule completely matches, evaluation continues on the very next rule. If the rule does not match, the first criterion from the list above which mismatches decides which skip step is used. There might be more than one criterion which mismatches, but only the first one, in the order of the list above, matters.

Obviously, the order of rules in your ruleset affects the skip step values calculated for each rule. For instance:

  1. pass on fxp0
  2. pass on fxp1
  3. pass on fxp0
  4. pass on fxp1

This ruleset will produce skip steps with value zero for the interface criterion in each rule, because no adjacent rules contain the same interface criterion.

Those rules could instead be ordered like:

  1. pass on fxp0
  2. pass on fxp0
  3. pass on fxp1
  4. pass on fxp1

The skip step value for the interface criterion would then equal one in the first and third rule.

This makes a small difference when the ruleset is evaluated for a packet on fxp2. Before the reordering, all four rules are evaluated because none of them can be skipped. After the reordering, only rules one and three need to be evaluated, and rules two and four can be skipped.

The difference may be insignificant in this little example, but imagine a ruleset containing 1,000 rules which all apply only to two different interfaces. If you order these rules so all rules applying to one interface are adjacent, followed by the rules applying to the other interface, pf can reliably skip 500 rules in each and every evaluation of the ruleset, reducing the cost of ruleset evaluation to 50%, no matter what packets your traffic consists of.

Hence, you can help pf maximize its skip steps by ordering your rules by the criteria in the order they are listed above, i.e. order your rules by interface first. Within the block of rules for the same interface, order rules by direction. Within the block for the same interface and direction, order by address family, etc.

To verify the effects, run

  $ pfctl -gsr

pfctl prints the calculated skip step values for each criterion in each rule, for instance

  @18 block return-rst in quick on kue0 proto tcp from any to any port = 1433
  [ Skip steps: i=38 d=38 f=41 p=27 sa=48 sp=end da=43 ]

In this output, 'i' stands for interface, 'd' for direction, 'f' for address family, etc. The 'i=38' part means that packets which don't match 'on kue0' will skip the next 38 rules.

This also affects the number of evaluations counted for each rule, try:

  $ pfctl -vsr

pfctl counts how many times each rule has been evaluated, how many packets and bytes it matched and how many states it created. When a rule is skipped by skip steps during evaluation, its evaluation counter is not increased.

Use tables for address lists

The use of lists in curly braces allows to write very compact rules in pf.conf, like:

  pass proto tcp to {, } port { ssh, www }

But these lists are not actually loaded into a single rule in the kernel. Instead, pfctl expands the single input rule to multiple rules for the kernel, in this case

  $ echo "pass proto tcp to {, } port { ssh, www }" |
	pfctl -nvf -
  pass inet proto tcp from any to port = ssh keep state
  pass inet proto tcp from any to port = www keep state
  pass inet proto tcp from any to port = ssh keep state
  pass inet proto tcp from any to port = www keep state

The short syntax in pf.conf betrays the real cost of evaluating it. Your pf.conf might be only a dozen rules long, but if those expand to hundreds of rules in kernel, evaluation cost is the same as if you put those hundreds of rules in pf.conf in the first place. To see what rules are really being evaluated, check:

  $ pfctl -sr

For one specific type of list, addresses, there is a container in kernel, called 'table'. For example:

  pass in from {,, }

The list of addresses can be expressed as a table:

  table <clients> const {,, }
  pass in from <clients>

This construct can be loaded as a single rule (and a table) into the kernel, whereas the non-table version would expand to three rules.

During evaluation of the rule referencing the table, pf will do a lookup of the packet's source address in the table to determine whether the rule matches the packet. This lookup is very cheap, and the cost does not increase with the number of entries in the table.

If the list of addresses is large, the performance gain of one rule evaluation with one table lookup vs. one rule evaluation for each address is significant. As a rule of thumb, tables are cheaper when the list contains six or more addresses. For a list of 1,000 addresses, the difference will be factor 1,000.

Use quick to abort ruleset evaluation when rules match

When a rule does match, pf (unlike other packet filtering products) does not by default abort ruleset evaluation, but continues until all rules have been evaluated. When the end is reached, the last rule that matched (the last-matching rule) makes the decision.

The option 'quick' can be used in rules to make them abort ruleset evaluation when they match. When 'quick' is used on every single rule, pf's behaviour effectively becomes first-matching, but that's not the default.

For instance, pf filters packets passing through any interface, including virtual interfaces such as loopback. If, like most people, you don't intend to filter loopback traffic, a rule like the following at the top can save a lot of rule evaluations:

  set skip on { lo0 }

The ruleset might contain hundreds of rules all mismatching the loopback interface, and loopback traffic might just pass by the implicit default pass. The difference is between evaluating these hundreds of rules for every loopback packet.

Usually, you'd place a rule with 'quick' at the top of the ruleset, reasoning that it has the potential of matching and saving the evaluation of the rules further down. But in those cases where the rule does not match a packet, placement of the rule at the top has caused one more evaluation. In short, the frequency with which a rule is expected to match on average is also relevant when deciding placement within the ruleset for performance reasons. And the frequency with which it does match depends on your actual traffic.

Instead of guessing how likely a rule should match on average, you can use the rule evaluation and matching counters that are printed by:

  $ pfctl -vsr

When you see a rule near the top that is evaluated a lot but rarely matches, you can move it further down in the ruleset.

Anchors with conditional evaluation

An anchor is basically a ruleset separate from the main ruleset, or a sub-ruleset. You can load entire rulesets into anchors, and cause them to get evaluated from the main ruleset.

Another way to look at them is to compare filtering rules with a programming language. Without anchors, all your code is in a single main function, the main ruleset. Anchors, then, are just subroutines, code in separate functions that you can call from the main function.

As of OpenBSD 3.6, you can also nest anchors within anchors, building a hierarchy of subroutines, and call one subroutine from another. In OpenBSD 3.5 and before, the hierarchy could only be one level deep, that is, you could have multiple subroutines, but could call subroutines only from the main ruleset.

For instance:

  pass in proto tcp from to port www
  pass in proto udp from to
  pass in proto tcp from to port www
  pass in proto tcp from to port ssh
  pass in proto udp from to
  pass in proto tcp from to port www
  pass in proto udp from to
  pass in proto tcp from to port www

You could split the ruleset into two sub-rulesets, one for UDP called “udp-only”:

  pass in proto udp from to
  pass in proto udp from to
  pass in proto udp from to

And a second one for TCP called “tcp-only”:

  pass in proto tcp from to port www
  pass in proto tcp from to port www
  pass in proto tcp from to port ssh
  pass in proto tcp from to port www
  pass in proto tcp from to port www

Both of them can be called from the main ruleset with:

  anchor udp-only
  anchor tcp-only

That would not improve performance much, though. Actually, there is some overhead involved when the kernel has to step into and out of these sub-rulesets.

But anchor calls can also contain filter criteria, much like pass/block rules:

  anchor udp-only in on fxp0 inet proto udp
  anchor tcp-only in on fxp0 inet proto tcp

The sub-ruleset is only evaluated for packets that match the criteria. In other words, the subroutine is only conditionally evaluated. When the criteria do not match, the call is skipped, and the evaluation cost is limited to the comparison of the criteria in the call.

For performance, this is mainly relevant when the sub-ruleset contains many rules, and the call criteria are not those primarly optimized by skip steps.

Let pfctl do the work for you

As of OpenBSD 3.6, several of the optimizations discussed can be automated by pfctl -o. The optimizer analyzes a ruleset and makes modifications that do not change the effect of the ruleset.

First, pfctl splits the ruleset into blocks of adjacent rules in such a way that reordering rules within one block cannot possibly affect the outcome of evaluation for any packet.

For example, the rules in the following block can be arbitrarily reordered:

  pass proto tcp to port www keep state
  pass proto udp to port domain keep state
  pass proto tcp to keep state

But in most cases rule order is relevant. For instance:

  block log all
  block from
  pass from any to

Changing the position of either of those rules produces completely different effects. After swapping the first two rules, packets from still get blocked, but they're now also logged. Exchange the last two rules, and packets from to are suddenly blocked. And switching the first and last rule blocks every packet.

In every case of possible dependancy, pfctl splits the rules into separate blocks. In the worst case, when no two adjacent rules can be freely reordered, each rule becomes a separate block containing only that rule, and pfctl can't make any modifications.

Otherwise, pfctl sorts the rules in each block so that skip step values are maximized:

  $ cat example
  pass proto tcp from to
  pass proto udp from
  pass proto tcp from
  pass proto tcp from
  pass proto udp from
  pass proto tcp from to

  $ pfctl -onvf example
  pass inet proto tcp from to
  pass inet proto tcp from to
  pass inet proto tcp from to any
  pass inet proto tcp from to any
  pass inet proto udp from to any
  pass inet proto udp from to any

When duplicate rules are found, they are removed:

  $ cat example
  pass proto tcp from
  pass proto udp from
  pass proto tcp from

  $ pfctl -onvf example
  pass inet proto tcp from to any
  pass inet proto udp from to any

Redundant rules are removed as well:

  $ cat example
  pass proto tcp from 10.1/16
  pass proto tcp from
  pass proto tcp from 10/8

  $ pfctl -onvf example
  pass inet proto tcp from to any

Multiple rules are combined into a single rule using a table where possible and advantageous:

  $ cat example
  pass from
  pass from
  pass from
  pass from
  pass from
  pass from

  $ pfctl -onvf example
  table <__automatic_0> const {
  pass inet from <__automatic_0> to any

When called with -oo, pfctl also consults the evaluation counters shown by pfctl -vsr to reorder 'quick' rules according to matching frequency.

It's very conservative in doing any changes, only performing changes that are certain to not affect the outcome of the ruleset evaluation under any circumstances for any packet. This has the advantage that the optimizer can be used safely with any ruleset. The drawback is that pfctl might not dare change something which you could, if you thought about the effect of the change. Like the skip step optimization, the performance improvement depends on how large the blocks of reorderable rules are. By manually reordering rules first, you can potentially improve the gain the optimizer can produce.

The easiest way to see what -o or -oo does with your ruleset is to compare its suggestion with the original ruleset, like this:

  $ pfctl -nvf /etc/pf.conf >before
  $ pfctl -oonvf /etc/pf.conf >after
  $ diff -u before after

When run against a manually optimized ruleset, the differences are usually unspectacular. Significant improvements can be expected for rulesets that were automatically generated by rule editing frontends.

Testing Your Firewall


A packet filter enforces a filtering policy by evaluating the rules of a ruleset and passing or blocking packets accordingly. This chapter explains how to test whether a pre-defined policy is being enforced correctly, and how to find and correct mistakes when it isn't.

During the course of this chapter, we'll be comparing the task of writing a firewall ruleset to computer programming in general. If you don't have any experience with computer programming, this approach might sound complicated and unsettling. The configuration of a firewall shouldn't require a degree in computing sciene or experience in programming, right?

The answer is no, it shouldn't and mostly doesn't. The language used in rulesets to configure pf is made to ressemble the human language. For instance:

  block all
  pass out all keep state
  pass in proto tcp to any port www keep state

Indeed, it doesn't take a computer programmer to understand what this ruleset does or to intuitively write a ruleset to implement a similarly simple policy. Chances are good that a ruleset created like this will do precisely what the author wanted.

Unfortunately, computers do what you tell them to do instead of what you want them to do. Worse, they can't tell the difference between the two, if there is any. If the computer doesn't do precisely what you want, even though you assumed you made your instructions clear, it's up to you to identify the difference and reformulate the instructions. Since this is a common problem in programming, we can look at how programmers deal with it. It turns out that the skills and methods used to test and debug programs and rulesets are very similar. You won't need to know any programming languages to understand the implications for firewall testing and debugging.

A well-defined filtering policy

The filtering policy is an informal specification of what the firewall is supposed to do. A ruleset, like a program, is the implementation of a specification, a set of formal instructions executed by a machine. In order to write a program, you need to define what it should do.

Hence, the first step in configuring a firewall is specifying informally what it is supposed to do. What connections should it block or pass? An example would be:

  • There are three distinct sections of the network that are isolated from each other by the firewall. Any connection that crosses the border of one section must pass through the firewall. The firewall has three interfaces, each connected to one of the sections:
  • $ext_if to the external internet
  • $dmz_if to a DMZ with servers
  • $lan_if to a LAN with workstations
  • Hosts in the LAN may freely connect to any hosts in the DMZ or the external internet.
  • Servers in the DMZ may freely connect to hosts on the external internet. Hosts on the external internet may connect only to the following servers in the DMZ:
  • web server port 80
  • mail server port 25
  • Anything else is prohibited (for instance, external hosts may not connect to hosts on the LAN)

The policy is expressed informally, in any way a human reader can understand it. It should be specific enough so that the human reader can clearly deduce decisions of the form 'should a connection from host X to host Y coming in (or going out) on interface Z' be blocked or passed? If you can think of cases where the policy doesn't give a clear answer to any such question, the policy is not specific enough.

Vague policies like “allow only what is strictly necessary” or “block attacks” need to be made more precise, or you won't be able to implement them or test them. Like in software development, lacking specifications rarely lead to meaningful and correct implementations (“why don't you start writing code, while I go find out what the customer wants”).

You might receive a complete policy and your task is to implement it, or defining the policy might be part of your task. In either case, you'll need to have the policy in hand before you can complete implementing and testing it.

A ruleset implementing the policy

The ruleset is written as a text file containing statements in a formal language. Like the source code of a programming language is parsed and translated into machine-level instructions by a compiler, the ruleset source text is parsed and translated by pfctl and the result is then interpreted by pf in the kernel.

When the source code violates the formal language, the parser reports a syntax error and refuses to translate the text file. This is called a compile-time error and such errors are reliably detected and usually resolved quickly. When pfctl can't parse your ruleset file, it reports the number of the line in the file where the error occured together with a more or less accurate description of what it didn't understand. Unless the entire ruleset file could be parsed without any syntax errors, pfctl does not change the previous ruleset in the kernel. As long as the ruleset file contains one or more syntactic errors, there is no program pf can execute.

The second type of errors is called run-time error, as it occurs only when a syntactically correct program that has been successfully translated is running. With a generic programming language, this might occur when the program is dividing a number by zero, tries to access an invalid memory location or runs out of memory. Since pf rulesets are very limited compared to the functionality of a generic programming language, most of these errors cannot occur when a ruleset is executed, i.e. a ruleset can't 'crash' like a generic program. But, of course, a ruleset can produce the wrong output at run-time, by blocking or passing packets that it shouldn't according to the policy. This is sometimes called a 'logic error', an error that doesn't cause the execution of a program to get aborted, but 'merely' produces incorrect output.

So, before we can start to test whether the firewall correctly implements our policy, we first need to have a ruleset loaded successfully.

Parser errors

Parser errors are reported when you try to load a ruleset file with pfctl, for instance:

  # pfctl -f /etc/pf.conf
  /etc/pf.conf:3: syntax error

The message means that there is a syntactic error on line 3 of the file /etc/pf.conf and pfctl couldn't load the ruleset. The in-kernel ruleset has not been changed, it remains the same as it was before the failed attempt to load the new ruleset.

There are many different errors that pfctl can produce. The first step is to take a close look at the error message and read it carefully. Not all parts might make sense immediately, but the best chance to understanding what is going wrong is to read all parts. If the message has the format “filename:number: text”, it refers to the line with that number inside the file with that name.

The next step is to look at the specific line, either using a text editor (in vi, you can jump to line 3 by entering 3G in beep mode), or like this:

  # cat -n /etc/pf.conf
       1  int_if = "fxp 0"
       2  block all
       3  pass out on $int_if inet all kep state

  # head -n 3 /etc/pf.conf | tail -n 1
  pass out inet on $int_if all kep state

The problem might be a simple typo, like in this case (“kep” instead “keep”). After fixing that, we try to reload the file:

  # pfctl -f /etc/pf.conf
  /etc/pf.conf:3: syntax error
  # head -n 3 /etc/pf.conf | tail -n 1
  pass out inet on $int_if all keep state

Now the keywords are all valid, but on closer inspection, we notice that the placement of the “inet” keyword before “on $int_if” is invalid. It also illustrates that the same line can obviously contain more than a single mistake. pfctl only reports the first problem it finds, and then aborts. If, on retry, it reports the same line again, there might be more mistakes, or the first problem wasn't corrected properly.

Misplacement of keywords is another common mistake. It can be identified by comparing the rule with the BNF syntax at the bottom of the pf.conf(5) man page, which contains:

     pf-rule        = action [ ( "in" | "out" ) ]
                      [ "log" | "log-all" ] [ "quick" ]
                      [ "on" ifspec ] [ route ] [ af ] [ protospec ]
                      hosts [ filteropt-list ]
     ifspec         = ( [ "!" ] interface-name ) | "{" interface-list "}"
     af             = "inet" | "inet6"

This implies that “inet” should come after “on $int_if”.

We correct that and retry again:

  # pfctl -f /etc/pf.conf
  /etc/pf.conf:3: syntax error
  # head -n 3 /etc/pf.conf | tail -n 1
  pass out on $int_if inet all keep state

There is nothing obviously wrong left now. But we're not seeing all the relevant details. The line depends on the definition of the macro $int_if. Could that be wrongly defined? Let's see:

  # pfctl -vf /etc/pf.conf
  int_if = "fxp 0"
  block drop all
  /etc/pf.conf:3: syntax error

After fixing the mistyped “fxp 0” into “fxp0”, we retry again:

  # pfctl -f /etc/pf.conf

No output means the file was loaded successfully.

In some cases, pfctl can provide a more specific error message instead of the generic “syntax error”, like:

  # pfctl -f /etc/pf.conf
  /etc/pf.conf:3: port only applies to tcp/udp
  /etc/pf.conf:3: skipping rule due to errors
  /etc/pf.conf:3: rule expands to no valid combination
  # head -n 3 /etc/pf.conf | tail -n 1
  pass out on $int_if to port ssh keep state

The error reported first is usually the most helpful one, and subsequent errors might be misleading. In this case, the problem is that the rule specifies a port criterion without specifying either proto udp or proto tcp.

In rare cases, pfctl is confused by unprintable characters or whitespace in the file, and the mistake is hard to spot without making those characters visible:

  # pfctl -f /etc/pf.conf
  /etc/pf.conf:2: whitespace after \
  /etc/pf.conf:2: syntax error
  # cat -ent /etc/pf.conf
       1  block all$
       2  pass out on gem0 from any to any \ $
       3  ^Ikeep state$

The problem is the blank after the backslash on the second line, before the end of the line, indicated by the dollar sign by cat -e.

Once the ruleset loads successfully, it can be a good idea to look at the result:

  $ cat /etc/pf.conf
  block all
  # pass from any to any \
  pass from to any
  $ pfctl -f /etc/pf.conf
  $ pfctl -sr
  block drop all

The backslash at the end of the comment line actually continues the comment line.

Expansion of {} lists can have surprising results, which also shows up in the parsed ruleset:

  $ cat /etc/pf.conf
  pass from { !, ! } to an
  $ pfctl -nvf /etc/pf.conf
  pass inet from ! to any
  pass inet from ! to any

Here, the problem is that “{ !, ! }” doesn't mean “any address except and”, the expansion will literally match any possible address.

You should reload your ruleset after making permanent changes, to make sure pfctl can load it during the next reboot. On OpenBSD, the rc(8) startup script /etc/rc first loads a small default ruleset, which blocks everything by default except for traffic required during startup (like dhcp or ntp). When it subsequently fails to load the real ruleset /etc/pf.conf, due to syntax errors introduced before the reboot without testing, the small default ruleset will remain active. Luckily, the small default ruleset allows incoming ssh, so the problem can still be fixed remotely.


Once you have a well-defined policy and a ruleset implementing it, testing means verifying that the implementation matches the policy.

There are two ways for the ruleset to fail: it might block connections which should be passed or it might pass connections which should be blocked.

Testing generally refers to empirically trying various cases of connections in a systematic way. There's an almost infinite number of different connections that a firewall might be facing, and it would be unfeasible to try every combination of source and destination addresses and ports on all interfaces. Proving the correctness of a ruleset might be possible for very simple rulesets. In practice, the better approach is to create a list of test-cases based on the policy, such that every aspect of the policy is covered. For instance, for our example policy the list of cases to try would be

  • a connection from LAN to DMZ (should pass)
  • from LAN to external (should pass)
  • from DMZ to LAN (should block)
  • from DMZ to external (should pass)
  • from external to DMZ to port 80 (should pass)
  • from external to DMZ to port 25 (should block)
  • from external to DMZ to port 80 (should block)
  • from external to DMZ to port 25 (should pass)
  • from external to LAN (should block)

The expected result should be defined in this list before the actual testing starts. When, during testing, the observed result is different from the expectation, the test has succeeded in finding an error in the implementation.

This might sound odd, but the goal of each test should be to find an error in the firewall implementation, and not to NOT find an error. The overall goal of the process is to build a firewall ruleset without any errors, but if you assume that there are errors, you want to find them rather than miss them. When you assume the role of the tester, you have to adopt a destructive mindset and try to break the ruleset. Only then a failure to break it becomes a constructive argument that the ruleset is free of errors.

TCP and UDP connections can generally be tested with nc(1). nc can be used both as client and as server (using the -l option). For ICMP queries and replies, ping(8) is a simple testing client.

To test connections that should be blocked, you can use any kind of tool that attempts to communicate with the target.

Using tools from ports, like nmap(1), you can scan multiple ports, even across multiple target hosts. Make sure to read the man page when results look odd. For instance, a TCP port is reported as 'unfiltered' when nmap receives a RST from pf. Also, if the host that runs nmap is itself running pf, that might interfere with nmap's ability to do its job properly.

There are some online penetration testing services which allow you to scan yourself from the Internet. Some draw invalid conclusions depending on how you block ports with pf (block drop vs. return-rst or return-icmp), be sure you understand what they do and how they make conclusions before you get alarmed.

More advanced intrusion tools might include things like IP fragmentation or sending of invalid IP packets.

To test connections that should pass according to the policy, it's best to simply use the protocols using the common applications that legitimate users will be using. For instance, if you should pass HTTP connections to a web server, using different web browsers to fetch various content from differnt client hosts is a better test than just confirming that the TCP handshake to the server works with nc(1). Some errors are affected by factors like the hosts' operating systems, for instance you might see problems with TCP window scaling or TCP SACK only between hosts running specific operating systems.

When the expected result is 'pass', there are several ways in which an observed result can differ. The TCP handshake might fail because the peer (or the firewall) is returning an RST. The handshake might simply time out. The handshake might complete and the connection might work for a while but then stall or reset. The connection might work permanently, but throughput or latency might be different from expectations, either lower than expected or higher (in case you expect AltQ to rate-limit the connection).

Expected results can include other aspects than the block/pass decision, for instance whether packets are logged, how they are translated, how they are routed or whether they are increasing counters as expected. If you care about these aspects, they're worth including in the list of things to test together with their expected results.

Your policy might include requirements regarding performance, reaction to overloading or redundancy. These could require dedicated tests. If you set up fail-over using CARP, you probably want to test what happens in case of various kinds of failures.

When you do observe a result differ from the expectation, systematically note what you did during the test, what result you expected, why you expected that result, what result you observed and how the observation differs from the expectation. Repeat the test to see if the observed result is consistantly reproducable or if results vary. Try varying but similar parameters (like different source or destination addresses or ports).

Once you have a reproducable problem, the next step is to debug it to understand why things don't work as expected and how to fix them. When, during the course of this, you modify the ruleset, you'll have to repeat the entire list of tests again, including the tests that didn't show a problem previously. The change you made might have inadvertedly broken something that worked before.

The same applies to other changes made to the ruleset. A formal procedure for testing can make the process less error-prone. You're probably not going to repeat the entire test procedure for every little change you add to the ruleset. Some changes are trivial and shouldn't be able to break anything. But sometimes they do, or the sum of several changes introduces an error. You can use a revision control system like cvs(1) on your ruleset file, which helps investigating past changes to the file when errors are discovered. If you know that the error was not present a week ago, but now is, looking at all changes made to the file over the last week can help spot the problem, or at least allows to revert the changes until there's time to investigate the problem.

Non-trivial rulesets are like programs, they are rarely perfect in their first version, and it takes a while until they can be trusted to be free of bugs. Unlike real programs, which are never considered completely bug-free by most programmers, rulesets are simple enough to usually mature to that point.


Debugging refers to finding and removing programming mistakes in computer programs. In context of firewall rulesets, the term refers to the process of identifying why a ruleset's evaluation does not produce the expected result. The types of mistakes that can be made in rulesets are very limited compared to real computer programs, yet the methods used to find them are the same.

Before you start searching for the cause of a problem you should define what exactly is considered the problem. If you've spotted an error during testing yourself, that can be very simple. If another person is reporting the problem, it can be difficult to extract the essence from a vague problem report. The best starting point is a problem that you can reliably reproduce yourself.

Some network problems are not actually caused by pf. Before you focus on debugging your pf configuration, it is worth establishing that it is indeed pf responsible for the problem. This is simple to do and can safe a lot of time searching in the wrong place. Just disable pf with pfctl -d and verify that the problem goes away. If it does, re-enable pf with pfctl -e and verify that the problem occurs again. This does not apply to certain kinds of problems, like when pf is not NAT'ing connections as desired, because when pf is disabled, obviously the problem can't go away. But when possible, try to first prove that pf must be responsible.

Similarly, if the problem is that pf is not doing something you expect it to do, the first step should be to ensure that pf is actually running and the intended ruleset is successfully loaded, with:

  # pfctl -si | grep Status
  Status: Enabled for 4 days 13:47:32           Debug: Urgent
  # pfctl -sr
  pass quick on lo0 all
  pass quick on enc0 all

Debugging protocols

The second prerequisite to debugging is expressing the problem in terms of specific network connections. For instance, if the report is 'instant messaging using application X is not working', you need to find out what kind of connections are involved. The conclusion might be, for instance, that 'host A cannot establish a TCP connection to host B on port C'. Sometimes, this represents the entire difficulty, and once you understand what connections are involved, you realize that the ruleset doesn't allow them yet, and a simple change to the ruleset resolves the issue.

There are several ways to find out what connections are used by an application or protocol. tcpdump(8) can show packets arriving at or leaving from interfaces, both real interfaces like network interface cards and virtual interfaces like pflog(4) and pfsync(4). You can supply an expression that filters what packets are being shown, thereby excluding existing noise on the network. Attempt to communicate using the application or protocol in question, and see what packets are being sent. For example:

  # tcpdump -nvvvpi fxp0 tcp and not port ssh and not port smtp
  23:55:59.072513 > S
    4093655771:4093655771(0) win 5840 <mss 1380,sackOK,timestamp
    1039287798 0,nop,wscale 0> (DF)

This is a TCP SYN packet, the first packet part of the TCP handshake. The sender is port 65123 (which looks like a random high port) and the receipient is port 6667. A detailed description of the output format can be found in the tcpdump(8) man page. tcpdump is the most important tool used in debugging pf related problems, it's well worth getting familiar with.

Another approach is to use pf's log feature. Assuming you use the 'log' option in all 'block' rules, almost any packet blocked by pf will be logged. You can remove the 'log' option from rules that deal with known protocols, so only packets blocked for on unknown ports are logged. Try to use the blocked application and check pflog, like:

  # ifconfig pflog0 up
  # tcpdump -nettti pflog0
  Nov 26 00:02:26.723219 rule 41/0(match): block in on kue0: > S 3537828346:3537828346(0) win
    16384 <mss 1380,nop,nop,sackOK,[|tcp]> (DF)

If you're using pflogd(8), the daemon will constantly listen on pflog0 and store the log in /var/log/pflog, which you can view with:

  # tcpdump -netttr /var/log/pflog

When dumping pf logged packets, you can use extended filtering expressions to tcpdump, for instance it can show only logged packets that were blocked incoming on interface wi0 with:

  # tcpdump -netttr /var/log/pflog inbound and action block and on wi0

Some protocols, like FTP, are not that easy to match, because they don't use fixed port numbers or use multiple related connections. It might not be possible to pass them through the firewall without opening up a wide range of ports. For specific protocols there are solutions, like ftp-proxy(8).

Debugging rulesets When your ruleset is blocking a certain protocol because you didn't allow a necessary port, the problem is more of a design flaw than a bug in the ruleset. But what if you see a connection blocked that you have an explicit pass rule for?

For example, your ruleset might contain the rule

  block in return-rst on $ext_if proto tcp from any to $ext_if port ssh

But when you try to connect to TCP port 22, the connection is accepted! It appears like the firewall is ignoring your rule. As puzzling as these cases may be when experienced the first couple of times, there's always a logical and often trivial explanation.

First, you should verify everything we just assumed so far. For instance, we assumed that pf is running and the ruleset contains the rule above. It might be unlikely that these assumptions are wrong, but they're quickly verified:

  # pfctl -si | grep Status
  Status: Enabled for 4 days 14:03:13           Debug: Urgent
  # pfctl -gsr | grep 'port = ssh'
  @14 block return-rst in on kue0 inet proto tcp from any to port = ssh

Next, we assume that a TCP connection to port 22 is passing in on kue0. You might think that's obviously true, but it's worth verifying. Start tcpdump:

  # tcpdump -nvvvi kue0 tcp and port 22 and dst

Then repeat the SSH connection. You should see the packets of your connection in tcpdump's output. If you don't, that might be because the connection isn't actually passing through kue0, but through another interface, which would explain why the rule isn't matching. Or you might be connecting to a different address. In short, if you don't see the SSH packets arrive, pf won't see them either, and can't possibly block them using the rule in question.

But if you do see the packets with tcpdump, pf should see and filter them as well. The next assumption is that the block rule is not just present somewhere in the ruleset (which we verified already), but is the last matching rule for these connections. If it isn't the last matching rule, obviously it doesn't make the block decision.

How can the rule not be the last matching rule? Three reasons are possible:

  • a) The rule does not match because rule evaluation doesn't reach the rule. An earlier rule could also match and abort evaluation with the 'quick' option.
  • b) Rule evaluation reaches the rule, but the rule doesn't match the packet because some criteria in the rule mismatches.
  • c) Rule evaluation reaches the rule and the rule does match, but evaluation continues and a subsequent rule also matches.

To disprove these three cases, you can view the loaded ruleset, and mentally emulate a ruleset evaluation for a hypothetical TCP packet incoming on kue0 to port 22. Mark the block rule we're debugging. Start evaluation with the first rule. Does it match? If it does, mark the rule. Does it also have 'quick'? If so, abort evaluation. If not, continue with the next rule. Repeat until a rule matches and uses 'quick' or you reach the end of the ruleset. Which rule was the last matching one? If it isn't rule number 14, you have found the explanation for the problem.

Manually evaluating the ruleset like this can be tedious, even though it can be done pretty quickly and reliably with more experience. If the ruleset is large, you can temporarily reduce it. Save a copy of the real ruleset and remove all rules that you think can't affect this case. Load that ruleset and repeat the test. If the connection is now blocked, the conclusion is that one of the seemingly unrelated rules you removed is responsible for either a) or c). Re-add the rules one by one and repeat the test, until you reach the responsible rule. If the connection is still passed after removal of all unrelated rules, repeat the mental evaluation of the now reduced ruleset.

Another approach is to use pf's logging to identify the cases a) and c). Add 'log' to all 'pass quick' rules before rule 14. Add 'log' to all 'pass' rules after rule 14. Start tcpdump on pflog0 and establish a new SSH connection. You'll see what rule other than rule 14 is matching the packet last. If nothing is logged, the explanation must be b).

Following connections through the firewall

When a connection passes through the firewall, packets pass in on one interface and out on another. Replies pass in on the second interface and out of the first. Connections can therefore fail because pf is blocking packets in either of these four cases.

First you should find out which of the four cases is the problem. When you try to establish a new connection, you should see the TCP SYN on the first interface using tcpdump. You should see the same TCP SYN leaving out on the second interface. If you don't, the conclusion is that pf is blocking the packet in on the first interface or out on the second.

If the SYN is not blocked, you should see a SYN+ACK arrive in on the second interface and out on the first. If not, pf is blocking the SYN+ACK on either interface.

Add 'log' to the rules which should pass the SYN and SYN+ACK on both interfaces, as well as to all block rules. Repeat the connection attempt and check pflog. It should tell you precisely which case was blocked and by what rule.

Debugging states

The most common reason for pf to block a packet is because of an explicit block rule in the ruleset. The relevant last-matching block rule can be identified by adding the 'log' option to all potential rules and watching the pflog interface.

There are very few cases where pf silently drops packets not based on rules, where adding 'log' to all rules does not cause the dropped packets to get logged through pflog. The most common case is when a packet almost, but not entirely, matches a state entry.

Remember that for each packet seen, pf first does a state lookup. If a matching state entry is found, the packet is passed immediately, without evaluation of the ruleset.

A state entry contains information related to the state of one connection. Each state entry contains a unique key. This key consists of several values that are constant throughout the lifetime of a connection, these are:

  • address family (IPv4 or IPv6)
  • source address
  • destination address
  • protocol (like TCP or UDP)
  • source port
  • destination port

This key is shared among all packets related to the same connection, and packets related to different connections always have different keys.

When a new state entry is created by a 'keep state' rule, the entry is stored in the state tree using the state's key. An important limitation of the state tree is that all keys must be unique. That is, no two state entries can have the same key.

It might not be immediately obvious that the same two peers could not establish multiple concurrent connections involving the same addresses, protocol and ports, but this is actually a fundamental property of both TCP and UDP. In fact, the peers' TCP/IP stacks are only able to associate individual packets with their appropriate sockets by doing a similar lookup based on addresses and ports.

Even when a connection is closed, the same pair of addresses and ports cannot be reused immediately. The network might deliver a retransmitted packet of the old connection late, and if the receipient's TCP/IP stack would then falsely associate this packet with a new connection, this would disturb or even reset the new connection. For this reason, both peers are required to wait a specific period of time, called 2MSL for 'twice the maximum segment lifetime', before reusing an old pair of addresses and ports for a new connection.

You can observe this by manually establishing multiple connections between the same peer. For instance, you have a web server running on port 80, and connect to it from client using nc(8) twice, like this:

  $ nc -v 80 & nc -v 80
  Connection to 80 port [tcp/www] succeeded!
  Connection to 80 port [tcp/www] succeeded!

While the connections are still open, you can use netstat(8) on the client or server to list the connections:

  $ netstat -n | grep
  tcp        0      0 ESTABLISHED
  tcp        0      0 ESTABLISHED

As you can see, the client has chosen two different (random) source ports, so it doesn't violate the requirement of key uniqueness.

You can tell nc(8) to use a specific source port using -p, like:

  $ nc -v -p 31234 80 & nc -v -p 31234 80
  Connection to 80 port [tcp/www] succeeded!
  nc: bind failed: Address already in use

The TCP/IP stack of the client prevents the violation of the key uniqueness requirement. Some rare and faulty TCP/IP stacks do not respect this rule, and pf will block their connections when they violate the key uniqueness, as we'll see soon.

Let's get back to how pf does a state lookup when a packet is being filtered. The lookup consists of two steps. First, the state table is searched for a state entry with a key matching the protocol, addresses and port of the packet. This search accounts for packets flowing in either direction. For instance, assume the following packet has created a state entry:

incoming TCP from to

A lookup for the following packets would find this state entry:

incoming TCP from to
outgoing TCP from to

The state includes information about the direction (incoming or outgoing) of the initial packet that created the state. For instance, the following packets would NOT match the state entry:

outgoing TCP from to
incoming TCP from to

The reason for this restriction is not obvious, but quite simple. Imagine you only have a single interface with address where a web server is listening on port 80. When client connects to you (using random source port 28054), the initial packet of the connection comes in on your interface and all your outgoing replies should be from to You do not want to pass out packets from to, such packets would make no sense.

If you have a firewall with two interfaces and look at connections passing through the firewall, you'll see that every packet passing in on one interface passes out through the second. If you create state when the initial packet of the connection arrives in on the first interface, that state entry will not allow the same packet to pass out on the second interface, because the direction is wrong in the same way.

Instead, the packet is found to not match the state you already have, and the ruleset is evaluated. You'll have to explicitely allow the packet to pass out on the second interface with a rule. Usually, you'll want to use 'keep state' on that rule as well, so a second state entry is created that covers the entire connection on the second interface.

If you're wondering how it's possible to create a second state for the same connection when we've just explained how states must have unique keys, the explanation is that the state key also contains the direction of the connection, and the entire combination must be unique.

Now we can also explain the difference between floating and interface-bound states. By default, pf creates states that are not bound to any interface. That is, once you allow a connection in on one interface, packets related to the connection that match the state (including the direction restriction!) are passed on any interface. In simple setups with static routing this is only a theoretical issue. There is no reason why you should see packets of the same connection arrive in through several interfaces or why your replies should leave out through several interfaces. With dynamic routing, however, this can happen. You can choose to restrict states to specific interfaces. By using the global setting 'set state-policy if-bound' or the per-rule option 'keep state (if-bound)' you ensure that packets can match state only on the interface that created the state.

When virtual tunneling interfaces are involved, there are cases where the same connection passes through the firewall multiple times. For instance, the initial packet of a connection might first pass in through interface A, then pass in through interface B, then out through interface C and finally pass out through interface D. Usually the packet will be encapsulated on interfaces A and D and decapsulated on interfaces B and C, so pf sees packets of different protocols, and you can create four different states. Without encapsulation, when the packet is the same on all four interfaces, you may not be able to use some features like translations or sequence number modulation, because that would lead to state entries with conflicting keys. Unless you have a complex setup involving tunneling interfaces without encapsulation and see error messages like 'pf: src_tree insert failed', this should be of no concern to you.

Let's return to the state lookup done for each packet before ruleset evaluation. The search for a state entry with matching key will either find a single state entry or not find any state entry at all. If no state entry is found, the ruleset is evaluated.

When a state entry is found, a second step is performed for TCP packets before they are considered to be part of the known connection and passed without ruleset evaluation: sequence number checking.

There are many forms of TCP attacks, where an attacker is trying to manipulate a connection between two hosts. In most cases, the attacker is not located on the routing path between the hosts. That is, he can't listen in on the legitimate packets being sent between the hosts. He can, however, send packets to either host imitating packets of its peer, by spoofing (faking) his source address. The goal of the attacker might be to prevent establishment of connections or to tear down already established connections (to cause a denial of service) or to inject malicious payload into ongoing connections.

To succeed, the attacker has to correctly guess several parameters of the connection, like source and destination addresses and ports. Especially for well-known protocols, this isn't as impossible as it may appear. If the attacker knows both hosts' addresses and one port (because he's attacking a connection to a known service), he only has to guess one port. Even if the client is using a truly random source port (which isn't typical anyway), the attacker could try all 65536 possibilities in a short period of time.

The only thing that's truly hard to guess for an attacker is the right sequence number (and acknowledgement). If both peers chose their initial sequence numbers randomly (or you're modulating sequence numbers for hosts that have weak ISN generators), an attacker will not be able to guess an appropriate value at any given point during the connection.

Throughout the the lifetime of a valid TCP connection, the sequence numbers (and acknowledgements) of individual packets advance according to certain rules. For instance, once a host has sent a particular segment of data and the peer has acknowledged receiption, there is no legitimate reason for the sender to resend data for the same segment. In fact, an attempt to overwrite parts of already received data is not just invalid accoding to the TCP protocol, but a common attack.

pf uses these rules to deduce small windows for valid sequence numbers. Typically, pf can be sure that only about 30,000 out of 4,294,967,296 possible sequence numbers are valid at any point during a connection. Only when both a packet's sequence and acknowledgement number match these windows, pf will assume that the packet is legitimate and pass it.

When, during the state lookup, a state is found that matches the packet, the second step is to compare the packet's sequence numbers against the windows of allowed values stored in the state entry. When this second step fails, pf will produce a 'BAD state' message and drop the packet without evaluating the ruleset. There are two reasons for not evaluating the ruleset in this case: it would almost certainly be a mistake to pass the packet, and if the ruleset evaluation would result in a last-matching pass keep state rule, pf couldn't honour the decision and create a new state, as that would create a state key conflict.

In order to actually see and log 'BAD state' messages, you'll need to enable debug logging, using:

  $ pfctl -xm

Debug messages are sent to the console, and syslogd by default archives them in /var/log/messages. Look for messages starting with 'pf:', like:

  pf: BAD state: TCP
    [lo=1185380879 high=1185380879 win=33304 modulator=0 wscale=1]
    [lo=1046638749 high=1046705357 win=33304 modulator=0 wscale=1]
    4:4 A seq=1185380879 ack=1046638749 len=1448 ackskew=0 pkts=940:631
  pf: State failure on: 1 |

These messages always come in pairs. The first message shows the state entry at the time the packet was blocked and the sequence numbers of the packet that failed the tests. The second message lists the conditions that were violated.

At the end of the first message, you'll see whether the state was created on an incoming (dir=in) or outgoing (dir=out) packet, and whether the blocked packet was flowing in the same (dir=,fwd) or reverse (dir=,rev) direction relative to the initial state-creating packet.

A state contains three address:port pairs, two of which are always equal unless the connection is being translated by nat, rdr or binat. For outgoing connections, the source is printed on the left and the destination on the right. If the outgoing connection involves source translation, the pair in the middle shows the source after translation. For incoming connections, the connection's source is found on the right and the destination in the middle. If the incoming connection involves destination translation, the left-most pair shows the destination after translation. This format corresponds to the output of pfctl -ss, the only difference is that pfctl indicates the direction of the state using arrows instead.

Next, you see the two peers' current sequence number windows in square brackets. The '4:4' means the state is fully established (smaller values are possible during handshake, larger ones during connection closing). The 'A' indicates that the blocked packet had the ACK flag set, similar to the formatting of TCP flags in tcpdump(8) output, followed by the sequence (seq=) and acknowledgement (ack=) numbers in the blocked packet and the length (len=) of the packet's data payload. ackskew is an internal value of the state entry, only relevant when not equal zero.

The 'pkts=940:631' part means that the state entry has matches 940 packets in the direction of the initial packet and 631 packets in the opposite direction since it was created. These counters can be especially helpful in identifying the cause of problems occuring during the handshake, when either one is zero, contradicting your expectation that the state has matched packets in both directions.

The second message contains a list of one or more digits. Each digit printed represents one check that failed:

  1. the packet violates the receipients window (seq + len > high)
  2. the packet contains data already acknowledged (seq < lo - win)
  3. ackskew is smaller than the minimum
  4. ackskew is larger than the maximum
  5. similar to 1, but worse (seq + len > high + win)
  6. similar to 2, but worse (seq < lo - maximum win)

Luckily, 'BAD state' messages are not common for regular real-life traffic, pf's sequence number verification accounts for many benign anomalies. If you see the messages only sporadically and notice no stalling connections, you can safely ignore them. There are many different TCP/IP stacks out there on the Internet, and some of them produce weird packets occasionally.

However, there is one class of problems in pf configuration that can be diagnosed based on the 'BAD state' messages produced steadily in those cases.

Create TCP states on the initial SYN packet

Ideally, TCP state entries are created when the first packet of the connection, the initial SYN is seen. You can enforce this by following a simple principle:

Use 'flags S/SA' on all 'pass proto tcp keep state' rules!

All initial SYN packets (and only those packets) have flag SYN set but flag ACK not set. When all your 'keep state' rules that can apply to TCP packets are restricted these packet, only initial SYN packets can create states. Therefore, any TCP state created is created based on an initial SYN packet.

The reason for creating state only on initial SYN packets is a TCP extention called 'window scaling' defined in RFC 1323. The field of the TCP header used to advertise accepted windows became too small for today's fast links. Modern TCP/IP stacks would like to use larger window values than can be stored in the existing header field. Window scaling means that all window values advertised by one peer are to be multiplied by a certain factor by the receipient, instead of be taken literally. In order for this scheme to work, both peers must understand the extention and advertise their ability to support it during the handshake using TCP options. The TCP options are only present in the initial SYN and SYN+ACK packets of the handshake. If and only if both of those packets contain the TCP option, the negotiation is successful, and all further packets' window values are meant to be multiplied.

If pf wouldn't know about window scaling being used, it would take all advertised window values seen literally, and calculate its windows of acceptable sequence number ranges incorrectly. Typically, peers start to advertise smaller windows and gradually advertise larger windows during the course of a connection. Unaware of the window scaling factors, pf would at some point start to block packets because it would think one peer is overflowing the other's advertised window. The effects would be more or less subtle. Sometimes, the peers will react to the loss of the packets by going into a loss recovery mode and advertise smaller windows. When pf then passes subsequent retransmissions again, advertised windows grow again, up to the point where pf blocks packets. The effect is that connections temporarily stall and throughput is poor. It's also possible that connections stall completely and time out.

pf does know about window scaling and supports it. However, the prerequisite is that you create state on the initial SYN, so pf can associate the first two packets of the handshake with the state entry. Since the entire negotiation of the window scaling factors takes place only in these two packets, there is no reliable way to deduce the factors after the handshake.

Window scaling wasn't widely used in the past, but this is changing rapidly. Just recently, Linux started using window scaling by default. If you experience stalling connections, especially when problems are limited to certain combinations of hosts, and you see 'BAD state' messages related to these connections logged, verify that you're really creating states on the initial packet of a connection.

You can tell whether pf has detected window scaling for a connection from the output of pfctl like:

  $ pfctl -vss
   [3046252937 + 58296] wscale 0  [1605347005 + 16384] wscale 1

If you see 'wscale x' printed in the second line (even if x is zero), pf is aware that the connection uses window scaling.

Another simple method to identify problems related to window scaling is to temporarily disable window scaling support on either peer and reproduce the problem. On OpenBSD, the use of window scaling can be controlled with sysctl(8), like:

  $ sysctl net.inet.tcp.rfc1323
  $ sysctl -w sysctl net.inet.tcp.rfc1323=0
  net.inet.tcp.rfc1323: 1 -> 0

Similar problems occur when you create a state entry based on packets other than the initial SYN and use 'modulate state' or use translations. In both cases, the translation should occur at the beginning of the connection. If the first packet is not already translated, translation of subsequent packets will usually confuse the receipient and cause it to send replies that pf blocks with 'BAD state' messages.

Firewall Management

Kernel and userland

Packet filtering occurs inside the kernel. The kernel is the core of the operating system, a comparatively small part of the code base containing components like memory management, hardware drivers, file system code, network layers including the TCP/IP stack and pf, and process control.

When the operating system boots, the kernel takes control of the machine. After attaching detected hardware components, it starts the first process, init(8). All further processes are created from there, like the startup scripts rc(8) and netstart(8), or getty(8) which runs login(8) for terminals like your console, which in turn runs your shell when you log in, which then creates processes as you type commands.

Anything outside the kernel, including all processes created by users, is called userland. The kernel has unlimited privileges, while processes are always associated with a user and limited by the privileges of the user, enforced by the kernel.

As a user, you need to communicate with pf in the kernel to load a ruleset, to configure options, and to retrieve information like the contents of the state table or statistical counters. Operations of this kind, where the user initiates a request that pf in the kernel answers, take place through the ioctl(2) interface, using pfctl(8).

There is a second interface between pf in the kernel and userland, bpf(4). Using this interface, a userland process can register itself to receive network packets from the kernel. This is used by pflog(4) for logging.

pfctl and /dev/pf

Most operations of pf are controlled by using the pfctl(8) utility. Generally, pfctl is invoked to execute a particular command or request using command line arguments to specify the command and arguments. Results are printed on standard output.

pfctl, a userland process, opens the special file /dev/pf and sends ioctl commands through the file handle to the kernel. An ioctl command can both transfer arguments from the process to the kernel as well as transfer results back from the kernel to the process. Some commands given to pfctl by the user translate into a single ioctl call. Others might require several ioctl calls.

The file is special as it does not store data written to it in the file system and has no size:

  $ ls -l /dev/pf
  crw-------  1 root  wheel   73,   0 Nov 22 10:59 /dev/pf

The 'c' in the file mode (the left-most column) stands for character special file. For such files, ls(1) prints the so-called major and minor device numbers in place of the size. The major number, 73 in the output above, indicates which component in the kernel ioctl commands should be dispatched to. Since different architectures support different kinds of devices, the major number of a given device (or a pseudo device, like pf's ioctl interface) vary across architectures and may change across releases. Some devices, but not pf, support multiple instances, and the minor number, 0 in the output above, is used to dispatch commands to specific instances.

Access control

File permissions on /dev/pf act as access control for the requests sent through the file handle. Requests that don't modify any aspect of pf, like querying the contents of the state table, require read permission only. Requests that do change the configuration, like loading a ruleset, require both read and write permissions. By default, only root has access to the file:

  $ ls -l /dev/pf
  crw-------  1 root  wheel   73,   0 Nov 22 10:59 /dev/pf
  $ id
  uid=1000(dhartmei) gid=1000(dhartmei) groups=1000(dhartmei), 0(wheel)
  $ pfctl -d
  pfctl: /dev/pf: Permission denied

You can grant other users access to pf by changing these file permissions. For instance, you could allow all members of the wheel group access to read-only functions:

  $ chmod g+r /dev/pf
  $ ls -l /dev/pf
  crw-r-----  1 root  wheel   73,   0 Nov 22 10:59 /dev/pf

Special files like /dev/pf can be recreated with default permissions using the MAKEDEV(8) script:

  $ cd /dev
  $ ./MAKEDEV pf

This script calls mknod(8) to create a character type pseudo device with the major and minor number appropriate for the architecture. On macppc, it runs:

  $ mknod pf c 73 0
  $ chmod 600 pf

The file name does not need to be pf for the kernel to forward requests sent through the file to pf, only the major and minor numbers are relevant. Hence, you could create multiple special files, for instance in locations other than /dev for chrooted daemons or with different file owners or groups.

Note that access to pf, especially write access, should only be granted to trusted users or audited daemons, as it allows direct communication with pf in the kernel. Not only can a malicious user or a compromised daemon with access to pf disturb the operation of the packet filter or bypass your filtering policy, but insufficient input validation (a bug) in the kernel could potentially be exploited with invalid ioctl arguments to escalate privileges locally.

Another feature that affects access control is called securelevel(7). During boot, the kernel initially starts in 'insecure mode', also referred to as single-user and then switches to 'secure mode', known as multi-user. There is an optional 'highly secure mode', which can be set in rc.securelevel(8) to further lock down a system. The system becomes less generally useful in this state, but the harm a compromised root account can do is limited. pf no longer allows ruleset changes once this securelevel is reached.

How pf is started during the boot process

On OpenBSD, pf is automatically enabled at boot time when the following lines are present in either /etc/rc.conf or /etc/rc.conf.local:

  pf=YES                          # Packet filter / NAT
  pf_rules=/etc/pf.conf           # Packet filter rules file

First, the system startup script rc(8) loads a minimal default ruleset and enables pf:

  RULES="block all"
  RULES="$RULES\npass on lo0"
  RULES="$RULES\npass in proto tcp from any to any port 22 keep state"
  RULES="$RULES\npass out proto { tcp, udp } from any to any port 53 keep state"
  RULES="$RULES\npass out inet proto icmp all icmp-type echoreq keep state"
  if ifconfig lo0 inet6 >/dev/null 2>&1; then
    RULES="$RULES\npass out inet6 proto icmp6 all icmp6-type neighbrsol"
    RULES="$RULES\npass in inet6 proto icmp6 all icmp6-type neighbradv"
    RULES="$RULES\npass out inet6 proto icmp6 all icmp6-type routersol"
    RULES="$RULES\npass in inet6 proto icmp6 all icmp6-type routeradv"
  RULES="$RULES\npass proto { pfsync, carp }"
  case `sysctl vfs.mounts.nfs 2>/dev/null` in
    # don't kill NFS
    RULES="scrub in all no-df\n$RULES"
    RULES="$RULES\npass in proto udp from any port { 111, 2049 } to any"
    RULES="$RULES\npass out proto udp from any to any port { 111, 2049 }"
  echo $RULES | pfctl -f - -e

This ruleset is active while the network is being started through netstart(8). It only allows traffic necessary during netstart(8), like DNS or NFS. Your real ruleset couldn't be loaded at this point, because it might contain references to interface names and addresses which do not exist at this point, because netstart hasn't run yet. And you wouldn't want to just pass all traffic until your real ruleset has been loaded, because netstart(8) might start some vulnerable network daemon you rely on being protected by pf. There would be a brief window of vulnerability during each boot without the minimal default ruleset.

Afterwards, your full ruleset /etc/pf.conf is loaded:

  if [ -f ${pf_rules} ]; then
    pfctl -f ${pf_rules}

netstart(8) typically runs only for a brief period of time, so the use of the minimal default ruleset is barely noticable for most users, except for the case when the ruleset /etc/pf.conf cannot be loaded, for instance due to typographical mistake in the ruleset. In this case, the minimal default ruleset remains active, which does allow incoming SSH connections so the problem can be fixed remotely.

Basic operations

Adjusting output verbosity

The flags -q (quiet), -v (verbose), -vv (more verbose), and -g (regress test) can be used in combination with all other commands and affect the verbosity of the output a command produces.

The flag -r affects results that contain IP addresses. By default, addresses are shown numerically. With -r, reverse DNS looksup are performed and symbolic host names are shown instead, where available.

Combining commands A single invokation of pfctl can execute multiple commands when command line arguments are combined, for instance:

  $ pfctl -e -f /etc/pf.conf

This both enables pf and loads the ruleset. Some combinations have different results depending on chronological order of execution. pfctl executes some combinations in reasonable order (instead of evaluating command line options strictly from left to right), but if there is any ambiguity, commands should be issued with separate pfctl invocations.

Enabling and disabling pf

pf can be enabled and disabled using:

  $ pfctl -e
  pf enabled

  $ pfctl -d
  pf disabled

When pf is disabled, no packets are passed to pf to decide whether they should be blocked or passed. This can be used to diagnose problems or compare performance.

It's not required to enable or disable pf to perform other operations, e.g. you don't need to disable pf before and re-enable it after a ruleset change.

When filtering statefully, disabling pf can break ongoing connections that are translated or use sequence number modulation. Also, pf cannot associate packets with state entries while disabled. When packets are missed, state entries do not advance their sequence number windows, and connections can stall and reset when pf is re-enabled and may require re-establishment.

A less intrusive way to diagnose pf related problems is to leave pf enabled but flush (clear) the ruleset. An empty ruleset will pass all packets due to the pass rule implied when no matching rule is found.

  $ pfctl -Fr -Fn
  rules cleared
  nat cleared

Packets with invalid checksums or IP options are blocked by default even with an empty ruleset. Diagnosis of such cases might require disabling pf.

The current state, enabled or disabled, is show in the first line of output from

  $ pfctl -si
  Status: Enabled for 17 days 18:26:19          Debug: Urgent

pfctl operations, like loading rulesets or showing state entries, are possible even if pf is disabled. However, loading a ruleset does not automatically enable pf, an explicit pfctl -e is required.

Loading rulesets

Rulesets are loaded from files using:

  $ pfctl -f /etc/pf.conf

A file can be only parsed but not loaded, for instance to check syntax validity, by adding -n:

  $ pfctl -n -f /etc/pf.conf

Adding -v makes the output more verbose, showing what rules would be loaded into the kernel:

  $ pfctl -n -v -f /etc/pf.conf

Instead of a file name, '-' can be use for standard input, e.g.

  $ echo "block all" | pfctl -nvf -
  block drop all

If the ruleset contains macros, their values can be supplied or overridden from the command line when the ruleset is loaded using the -D option, like:

  $ cat /etc/pf.conf
  pass out on $ext_if keep state

  $ pfctl -D 'ext_if=wi0' -vf /etc/pf.conf
  pass out on wi0 all keep state

Ruleset files like /etc/pf.conf can contain filter rules (pass or block), translation rules (nat, rdr, and binat), and options (like set limit states 10000) and pfctl -f processes all of them. In the kernel, filter and translation rules are stored separately, i.e. a ruleset contains a list of filter rules and a list of translation rules.

You can load only the filter rules, leaving the translation rules unchanged, using:

  $ pfctl -R -f /etc/pf.conf

Conversely, only translation rules are loaded with:

  $ pfctl -N -f /etc/pf.conf

To load only the options, but neither filter nor translation rules, use:

  $ pfctl -O -f /etc/pf.conf

This is needed when you want to change an option from the command line like:

  $ echo "set limit states 20000" | pfctl -O -f -

Without the -O, pfctl would treat the piped input as a complete ruleset and replace the filter and translation rules with empty lists.

To show the currently loaded translation and filter rules, use:

  $ pfctl -sn -sr

Or use -sn or -sr on its own to show either list of rules only.

Verbose output is produced by adding -v or -vv:

  $ pfctl -vvsr
  @74 pass in on kue0 inet proto tcp from any to port = smtp flags S/SA keep state
  [ Evaluations: 95196     Packets: 95284     Bytes: 33351097    States: 0     ]

The '@74' show indicates the rule number, used as reference by other commands.

The second line shows how many times the rule has been evaluated, how many packets the rule was last-matching for, the sum of the sizes of these packets, and how many states currently exist in the state table that were created by the rule.

There's no need to flush rules before loading a new ruleset like

  $ pfctl -Fr -Fn -f /etc/pf.conf

In fact, this not only wastes CPU cycles, but introduces a (brief) temporary state with no rules loaded, when packets might pass that both the old and the new ruleset would block. A simple invokation with -f is sufficient and safe: while the new ruleset is being uploaded to the kernel, the old ruleset is still in effect. Once the new ruleset is completely uploaded, the kernel switches the rulesets and releases the old set. Any packet, at any time, is either filtered by the entire old ruleset or the entire new ruleset. If the upload fails for any reason, the old ruleset remains intact and in effect.

There are no pfctl commands to add or remove individual rules from a loaded ruleset. However, the output of pfctl -sr is valid input for pfctl -f. For instance, additional rules can be inserted at the beginning or end of the ruleset using:

  $ (echo "pass quick on lo0"; pfctl -sr) | pfctl -f -
  $ (pfctl -sr; echo "block all") | pfctl -f -

Piping the output through standard text processing tools like head(1), tail(1), sed(1), or grep(1), rulesets can be manipulated in many ways.

Instead of adding and removing rules, it's often simpler to use constant rules which reference tables, and to manipulate the tables so the rules apply to different sets of addresses.

Note that loading a ruleset does not remove state entries created by previously used rulesets. For instance, if your currently loaded ruleset contains the rule

  pass in proto tcp to port ssh keep state

and you establish an SSH connection matching this rule and creating a state entry, the state entry will continue to exist and to pass packets related to that connection even after you have loaded another ruleset which does not contain a similar rule or even explicitely blocks such connections.

To flush existing state entries, explicitely use

  $ pfctl -Fs

Managing state entries

To list the contents of the state table, use:

  $ pfctl -ss
  kue0 tcp -> FIN_WAIT_2:FIN_WAIT_2

The first column shows the interface the state was created for, except for states that are floating (not bound to interfaces), where 'self' is shown instead.

The second column shows the protocol of the connection, like tcp, udp, icmp, or other.

The following columns show the peers involved in the connection. Those can simply be two source and destination addresses (and ports, for tcp or udp) when the connection is not translated. When either source translation (nat or binat) or destination translation (rdr or binat) is used, a third address shows the original address before translation. The arrows ← and → indicate the direction of the connection (incoming and outgoing, respectively) from the point of view of the interface the state was created on.

The last column shows the condition the state is in, which determines the timeout value being used to remove the state entry. For TCP states, this loosly resembles the TCP states shown by netstat -p tcp for the local peer.

Adding -v make the output more verbose:

  $ pfctl -vss
  kue0 tcp <- TIME_WAIT:TIME_WAIT
   [3321306408 + 58242] wscale 0  [64544208 + 16656] wscale 0
   age 00:01:05, expires in 00:00:28, 10:9 pkts, 4626:1041 bytes, rule 74

For TCP connections, the second line shows the currently valid TCP sequence number windows, that is the lowest and highest segment pf will let pass. The first number shows the highest segment acknowledged by the peer, the lower boundary of the window, and the second number is the window advertised by the peer. The sum of both numbers equals the upper boundary. If the connection uses TCP window scaling, the scaling factors of both peers are shown. A value of n means the factor is 2^n. The value 0 means a peer advertised its supports of window scaling, but didn't want to scale its own windows (2^0 is factor 1). The windows in the square brackets are shown unscaled, that is, before any scaling factors are applied.

The third line shows the age of the state entry in hours, minutes, and seconds. Similarly, the time after which the entry will timeout if no further packets match the entry is shown next. In the example, the condition of the connection is TIME_WAIT:TIME_WAIT, so the timeout value tcp.closed applies, which defaults to 90 seconds. The state entry expires in 28 seconds, because the last packet of the connection was seen 62 seconds ago. If no further packet matches this state entry, the entry will be marked for removal in 28 seconds. Marked entries are removed periodically, the default interval is 10 seconds. This explains how state entries can show up in pfctl -vss output as 'expires in 00:00:00' for several seconds before they finally vanish.

The “10:9 pkts” on the third line in the example indicates that 19 packets have matched the state entry so far, 10 in the same direction as the packet that created the state entry, and 9 in the opposite direction. Similarly, “4626:1041 bytes” means those former 10 packets contained a total of 4626 bytes and the latter 9 packets a total of 1041 bytes.

The last part, “rule 74”, shows the number of the “pass … keep state” rule that created the state entry. This number usually does not equal the line number of the rule in the ruleset file, due to rule expansion. Instead, the number corresponds to the rule numbers printed by pfctl -vvsr, like:

  $ pfctl -vvsr | grep '@74 '
  @74 pass in on kue0 inet proto tcp from any to port = smtp flags S/SA keep state

More verbose output from pfctl -vvss includes an id and creator id of the state entry used by pfsync.

The state table can be flushed (cleared) with:

  $ pfctl -Fs

Individual entries can be killed (removed) with:

  $ pfctl -k
  $ pfctl -k -k

The first command kills all states from source, the second one kills all states from source to destination Depending on whether the state is for an incoming or outgoing connection, arguments may have to be reversed. The -k option is not very versatile, not all kinds of states can be killed with it, requiring to flush the entire state table.

Managing queues

The currently defined queues can be listed with:

  $ pfctl -s queue
  queue q_max priority 7 
  queue q_hig priority 5 
  queue q_def priority 3 
  queue q_low priq( default ) 

Adding -v adds two lines of counters for each queue:

  $ pfctl -v -s queue
  queue q_low priq( default ) 
    [ pkts:    4174247  bytes: 1861178708  dropped pkts:  10382 bytes: 2318648 ]
    [ qlength:   0/ 50 ]

The 'pkts' counter shows how many packets were assigned to the queue, 'bytes' is the sum of the those packets' sizes. Similarly, 'dropped pkts' counts packets that were assigned to the queue but had to be dropped because the queue length was reached, and the total size of those packets. 'qlength' shows the current fullness of the queue as the number of entries vs. the maximum number of entries.

Adding -vv makes pfctl show the same output as -v in an endless loop. Additionally, the differences of counters between passes, after the first pass, allows pfctl to print average packet rate and throughput, like:

  queue q_low priq( default ) 
    [ pkts:    4177298  bytes: 1861897544  dropped pkts:  10382 bytes: 2318648 ]
    [ qlength:   0/ 50 ]
    [ measured:     4.6 packets/s, 10.24Kb/s ]

Managing tables

A list of all existing tables is printed by:

  $ pfctl -s Tables

An individual table, specified by -t, can be manipulated using the -T command.

Show all entries of a table:

  $ pfctl -t spammers -T show

Delete all entries from a table:

  $ pfctl -t spammers -T flush
  5 addresses deleted.

Add an entry to a table:

  $ pfctl -t spammers -T add
  1/1 addresses added.
  $ pfctl -t spammers -T add 10/8
  1/1 addresses added.
  $ pfctl -t spammers -T add '!10.1/16'
  1/1 addresses added.

Delete an entry from a table:

  $ pfctl -t spammers -T delete
  1/1 addresses deleted.

Test whether an address matches a table:

  $ pfctl -t spammers -T test
  1/1 addresses match.
  $ pfctl -t spammers -T test
  0/1 addresses match.
  $ pfctl -t spammers -vv -T test
  0/1 addresses match.     !

Multiple entries can be added, removed, or tested like:

  $ pfctl -t spammers -T add
  3/3 addresses added.

Instead of listing the entries on the command line, the list can be read from a file:

  $ cat file
  $ pfctl -t spammers -T add -f file
  3/3 addresses added.

The following example searches the web server log for requests containing 'cmd.exe' (a common exploit attempt) and adds all (new) client addresses to a table:

  $ grep 'cmd\.exe' /var/www/logs/access.log | \
      cut -d ' ' -f 1 | sort -u | \
      pfctl -t weblog -T add -f -
  28/32 addresses added.

The table could be referenced by rules, for example, to block these clients, to redirect them to another server, or to queue replies to their web requests differently.

Managing anchors

Since OpenBSD 3.6, anchors can be nested within other anchors, forming a hierarchy, similar to the tree of files, directories and subdirectories in a filesystem. In this analogy, the rules are files, and anchors are (sub)directories. The main ruleset is the root directory.

You can load a ruleset, a list of rules, into an anchor, as you can create a number of files in a directory. Evaluating a ruleset corresponds to processing all files located in one directory.

When the main ruleset is evaluated for a packet, only the rules inside the main ruleset are automatically evaluated. If there are anchors containing rules, those rules are not automatically evaluated, unless there is an explicit call (like a function call) to them from the main ruleset.

There are two forms of calls that cause evaluation of anchors, the first one is:

  anchor "/foo" all

When rule evaluation reaches this rule, evaluation branches into the list of rules within anchor /foo, and evaluates them from first to last. Upon reaching the last rule within anchor /foo, evaluation returns to the caller and continues with the next rule after the anchor call in the caller's context.

Note that evaluation is not recursive. When anchor /foo contains sub-anchors, the lists of rules within those sub-anchors are not evaluated by the above call, only the rules directly within anchor /foo are.

The second form is:

  anchor "/foo/*" all

This call does not evaluate the list of rules in anchor /foo at all. Instead, all anchors within anchor /foo are traversed, and for each sub-anchor, the list of rules inside that sub-anchor is evaluated.

Again evaluation is not recursive. When the sub-anchors below anchor /foo contain sub-sub-anchors, the sub-sub-anchors are not evaluated, only the rules directly within the sub-anchors are.

Anchors can be used to dynamically change a ruleset (from a script, for instance) without reloading the entire main ruleset. When you regularly need to modify only a specific section of your main ruleset, you can move the rules of that section into an anchor, which you call from the main ruleset. Then you can modify the section by reload the rules of the anchor, without ever touching the main ruleset again. Of course, anchors can also be empty (contain no rules). Calling an empty anchor from the main ruleset simply does nothing while the anchor is empty. However, you can later load rules into the anchor and the main ruleset will then evaluate these rules automatically, not requiring a change in the main ruleset.

Another example is authpf(8), which dynamically modifies the filter policy to allow traffic from authenticated users. You create an anchor /authpf directly below the main ruleset. For each user who authenticates, the program creates a sub-anchor below anchor /authpf, and the rules for that user are loaded into that sub-anchor. The hierarchy looks like this:

  /		the main ruleset
  /authpf	the anchor containing the user anchors
  /authpf/fred	an anchor for user fred
  /authpf/paul	an anchor for user paul

Every anchor can contain rules, as every directory can contain files. In this case, however, the anchor authpf does not contain any rules, it only contains other anchors (like a directory that only contains subdirectories, but no files). The purpose of the authpf anchor is merely to hold the user anchors, not to contain rules itself. The users' anchors could be created directly in the main ruleset, but the intermediate anchor helps keep anchors organized. Instead of cluttering the namespace in the main ruleset, which could contain other anchors not related to authpf, all anchors related to authpf are stored inside one dedicated anchor, and authpf is free to do whatever it wants within that part of the world.

In this case, we want to evaluate the rules within anchor /authpf/fred and /authpf/paul. Actually, we want to evaluate the rules within all sub-anchors directly below /authpf, since authpf will dynamically add and remove sub-anchors. Hence, we can use the second form of call from the main ruleset:

  anchor "/authpf/*" all

Anchor calls don't have to specify absolute paths to the destination, relative paths are possible, too:

  anchor "authpf" all
  anchor "authpf/fred" all
  anchor "../../authpf" all

For relative paths, the point of reference is the caller, i.e. if anchor /foo/bar/baz contains the rule which calls “../frobnitz”, the destination is /foo/bar/frobnitz (no matter from where /foo/bar/baz may have been called).

You can list all top-level anchors with:

  $ pfctl -s Anchors

Adding -v lists all anchors recursively:

  $ pfctl -v -s Anchors

To list the sub-anchors of a specific anchor:

  $ pfctl -a authpf -s Anchors

Adding -v lists all anchors below the specified anchor recursively:

  $ pfctl -a authpf -v -s Anchors

To load a ruleset into an anchor:

  $ pfctl -a authpf/fred -f freds_rules.txt

To show the filter rules within an anchor:

  $ pfctl -a authpf/fred -sr

Anchors can also contain tables. A table within an anchor is manipulated in the same way as a table in the main ruleset, the only difference is the additional -a option specifying the anchor:

  $ pfctl -a authpf/fred -t spammers -T add

Other tools

Several tools that help managing pf are available through the ports tree.


Similar to what top(1) does for processes, pftop (ports/sysutils/pftop) shows information about pf in a curses-based interface. This includes views listing state entries, rules, queues, and labels. The lists can be sorted by various criteria. For example, you can watch your state entries ordered by the current amount of bandwidth they pass, or quickly locate the oldest or newest state entries. Rules and associated counters are displayed in a compact way.


pfstat (ports/net/pfstat) accumulates the counters available from pfctl -si output and produces simple graphs using the gd library. For instance, you can visually compare packet rate, throughput and number of states over extended periods of time.


symon (ports/sysutils/symon) is a more generic monitoring tool. Distributed agents read various system parameters like memory usage, disk IO, network interface counters and pf counters. The data is sent to a central collector which stores it in a round robin database (RRD). rrdgraph can be used to generate a variety of graphs from the database. This tool is much more versatile than pfstat, and the underlying database is better suited for large amounts of data. Its ability to correlate pf statistics with other system measurements (like CPU usage) is especially useful. Some familiarity with RRD tools (like experience with MRTG) is required, though.

Copyright © 2004-2006 Daniel Hartmeier Permission to use, copy, modify, and distribute this documentation for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.

bsd/pf_poilu.txt · Last modified: 2010/01/12 13:29 (external edit)