Cisco ACI Preferred group, a pinch of inter-VRF leaking and L3Out

In the latest article we’ve discussed the implementation of inter-VRF leaking using two regular EPGs. Naturally, it’s possible to use an L3Out in shared service design – for instance, to provide common Internet access. However, the ACI Contract whitepaper has a section that highlights a rather peculiar limitation with an L3Out:

“Due to CSCvm63145, an EPG in a preferred group can consume an inter-VRF contract, but cannot be a provider for an inter-VRF contract with an L3Out EPG as the consumer.”

There is no further explanation of such a state of affairs. If you check out the detect itself, it sheds a bit more light on what goes wrong: if an EPG is a provider for inter-VRF contract, then it cannot communicate within Preferred Group because of some restrictive zoning filter. However, shouldn’t the interaction between EPGs be governed by an explicit contract in the first place? Let’s test such a setup and see ourselves:

Host emulates 3 entities: provider of a service (Provider), consumer of that service (L3Out) and some other endpoint (TestEPG) that is part of the Preferred Group in TestVrf1. L3Out uses OSPF to exchange prefixes. 2.2.2.2/32 is expected to use the service located at 192.168.1.1. Meanwhile, both Provider and TestEPG are in the same subnet, thus same BD.

Here is the configuration of Access Policy section to allow physical connectivity:

resource "aci_vlan_pool" "TestPool" {
  name  = "TestPool"
  alloc_mode  = "static"
}
resource "aci_ranges" "TestRange" {
  vlan_pool_dn  = aci_vlan_pool.TestPool.id
  from = "vlan-1"
  to = "vlan-1000"
  alloc_mode = "static"
}
resource "aci_physical_domain" "PhysicalDomain" {
  name = "PhysicalDomain"
  relation_infra_rs_vlan_ns = aci_vlan_pool.TestPool.id
}
resource "aci_l3_domain_profile" "L3Domain" {
  name = "L3Domain"
  relation_infra_rs_vlan_ns = aci_vlan_pool.TestPool.id
}
resource "aci_attachable_access_entity_profile" "TestAAEP" {
    name = "TestAAEP"
}
resource "aci_aaep_to_domain" "PhysicalDomain-to-TestAAEP" {
  attachable_access_entity_profile_dn = aci_attachable_access_entity_profile.TestAAEP.id
  domain_dn = aci_physical_domain.PhysicalDomain.id
}
resource "aci_aaep_to_domain" "L3Domain-to-TestAAEP" {
  attachable_access_entity_profile_dn = aci_attachable_access_entity_profile.TestAAEP.id
  domain_dn = aci_l3_domain_profile.L3Domain.id
}
resource "aci_leaf_interface_profile" "TestInterfaceProfile" {
    name = "TestInterfaceProfile"
}
resource "aci_access_port_block" "TestAccessBlockSelector" {
  access_port_selector_dn = aci_access_port_selector.TestAccessPortSelector.id
  name = "TestAccessBlockSelector"
  from_card = "1"
  from_port = "2"
  to_card = "1"
  to_port = "4"
}
resource "aci_access_port_selector" "TestAccessPortSelector" {
    leaf_interface_profile_dn = aci_leaf_interface_profile.TestInterfaceProfile.id
    name = "TestAccessPortSelector"
    access_port_selector_type = "range"
    relation_infra_rs_acc_base_grp = aci_leaf_access_port_policy_group.TestAccessInterfacePolicy.id
}
resource "aci_leaf_access_port_policy_group" "TestAccessInterfacePolicy" {
    name = "TestAccessInterfaceProfile"
    relation_infra_rs_att_ent_p = aci_attachable_access_entity_profile.TestAAEP.id
}
resource "aci_leaf_profile" "TestSwitchProfile" {
  name = "TestSwitchProfile"
  leaf_selector {
    name = "LeafSelector"
    switch_association_type = "range"
    node_block {
      name  = "Block1"
      from_ = "101"
      to_   = "104"
    }
  }
  relation_infra_rs_acc_port_p = [aci_leaf_interface_profile.TestInterfaceProfile.id]
}

After that we can define a tenant, containing required EPGs and network entities:

resource "aci_tenant" "TestTenant" {
    name = "TestTenant"
}
resource "aci_vrf" "TestVrf1" {
    tenant_dn = aci_tenant.TestTenant.id
    name = "TestVrf1"
}
resource "aci_vrf" "TestVrf2" {
    tenant_dn = aci_tenant.TestTenant.id
    name = "TestVrf2"
}
resource "aci_bridge_domain" "TestBD1" {
    tenant_dn = aci_tenant.TestTenant.id
    name  = "TestBD1"
    relation_fv_rs_ctx = aci_vrf.TestVrf1.id
}
resource "aci_subnet" "ProviderSubnet" {
    parent_dn = aci_application_epg.Provider.id
    ip = "192.168.1.1/32"
    scope = ["public", "shared"]
    ctrl = ["no-default-gateway"]
}
resource "aci_subnet" "TestEPGSubnet" {
    parent_dn = aci_bridge_domain.TestBD1.id
    ip = "192.168.1.254/24"
    scope = ["public", "shared"]
}
resource "aci_application_profile" "TestAP" {
    tenant_dn = aci_tenant.TestTenant.id
    name = "TestAP"
}
resource "aci_application_epg" "Provider" {
    application_profile_dn = aci_application_profile.TestAP.id
    name = "Provider"
    relation_fv_rs_bd = aci_bridge_domain.TestBD1.id
    pref_gr_memb = "include"
}
resource "aci_application_epg" "TestEPG" {
    application_profile_dn = aci_application_profile.TestAP.id
    name = "TestEPG"
    relation_fv_rs_bd = aci_bridge_domain.TestBD1.id
    pref_gr_memb = "include"
}
resource "aci_epg_to_domain" "ProviderDomain" {
    application_epg_dn = aci_application_epg.Provider.id
    tdn = aci_physical_domain.PhysicalDomain.id
}
resource "aci_epg_to_domain" "TestEPGDomain" {
    application_epg_dn = aci_application_epg.TestEPG.id
    tdn = aci_physical_domain.PhysicalDomain.id
}
resource "aci_bulk_epg_to_static_path" "ProviderStaticPath" {
  application_epg_dn = aci_application_epg.Provider.id
  static_path {
    interface_dn = "topology/pod-1/paths-101/pathep-[eth1/2]"
    encap = "vlan-100"
  }
}
resource "aci_bulk_epg_to_static_path" "TestEPGStaticPath" {
  application_epg_dn = aci_application_epg.TestEPG.id
  static_path {
    interface_dn = "topology/pod-1/paths-101/pathep-[eth1/2]"
    encap = "vlan-101"
  }
}

Let’s define a generic contract that permits everything and assign it to Provider:

resource "aci_contract" "TestContract" {
    tenant_dn = aci_tenant.TestTenant.id
    name = "TestContract"
    scope = "tenant"
}
resource "aci_contract_subject" "TestSubject" {
    contract_dn = aci_contract.TestContract.id
    name = "TestSubject"
}
resource "aci_contract_subject_filter" "PermitIPSubj" {
  contract_subject_dn = aci_contract_subject.TestSubject.id
  filter_dn = aci_filter.PermitIPFilter.id
}
resource "aci_filter" "PermitIPFilter" {
    tenant_dn = aci_tenant.TestTenant.id
    name = "PermitIPFilter"
}
resource "aci_filter_entry" "PermitIPFilterEntry" {
    filter_dn = aci_filter.PermitIPFilter.id
    name = "demo_entry"
    d_to_port = "unspecified"
    ether_t = "ip"
}
resource "aci_application_epg" "Provider" {
    application_profile_dn = aci_application_profile.TestAP.id
    name = "Provider"
    relation_fv_rs_bd = aci_bridge_domain.TestBD1.id
    relation_fv_rs_prov = [aci_contract.TestContract.id]
    pref_gr_memb = "include"
}

Now we can set up the Host and verify if there is connectivity to the fabric. This way we make sure that the previous steps are successful, and nothing has been missed.

Host# show run vrf Provider
interface Ethernet1/1.100
  vrf member Provider
vrf context Provider
  ip route 0.0.0.0/0 192.168.1.254
  address-family ipv4 unicast
Host#
Host# show vrf Provider interface 
Interface                 VRF-Name                        VRF-ID  Site-of-Origin
Ethernet1/1.100           Provider                             3  --
Host#
Host# show run interface e1/1.100
interface Ethernet1/1.100
  encapsulation dot1q 100
  mac-address 0000.0000.0001
  vrf member Provider
  ip address 192.168.1.1/24
Host#
Host# show run vrf TestEPG
interface Ethernet1/1.101
  vrf member TestEPG
vrf context TestEPG
  ip route 0.0.0.0/0 192.168.1.254
  address-family ipv4 unicast
Host#
Host# show vrf TestEPG interface
Interface                 VRF-Name                        VRF-ID  Site-of-Origin
Ethernet1/1.101           TestEPG                              5  --
Host#
Host# show run interface e1/1.101
interface Ethernet1/1.101
  encapsulation dot1q 101
  mac-address 0000.0000.0002
  vrf member TestEPG
  ip address 192.168.1.2/24

Since we use the same physical interface to connect to the fabric, subinterfaces would inherit the same MAC address from it. In such a case ACI would incorrectly consider both IPs to be part of the same endpoint and EPG as a result. The fix is simple – use different MAC addresses so we define them manually.

Host# ping 192.168.1.254 vrf Provider
PING 192.168.1.254 (192.168.1.254): 56 data bytes
64 bytes from 192.168.1.254: icmp_seq=0 ttl=63 time=1.145 ms
64 bytes from 192.168.1.254: icmp_seq=1 ttl=63 time=0.898 ms
64 bytes from 192.168.1.254: icmp_seq=2 ttl=63 time=1.008 ms
64 bytes from 192.168.1.254: icmp_seq=3 ttl=63 time=0.97 ms
64 bytes from 192.168.1.254: icmp_seq=4 ttl=63 time=1.023 ms

--- 192.168.1.254 ping statistics ---
5 packets transmitted, 5 packets received, 0.00% packet loss
round-trip min/avg/max = 0.898/1.008/1.145 ms
Host#
Host# ping 192.168.1.254 vrf TestEPG
PING 192.168.1.254 (192.168.1.254): 56 data bytes
64 bytes from 192.168.1.254: icmp_seq=0 ttl=63 time=1.24 ms
64 bytes from 192.168.1.254: icmp_seq=1 ttl=63 time=0.961 ms
64 bytes from 192.168.1.254: icmp_seq=2 ttl=63 time=1.021 ms
64 bytes from 192.168.1.254: icmp_seq=3 ttl=63 time=0.744 ms
64 bytes from 192.168.1.254: icmp_seq=4 ttl=63 time=0.785 ms

--- 192.168.1.254 ping statistics ---
5 packets transmitted, 5 packets received, 0.00% packet loss
round-trip min/avg/max = 0.744/0.95/1.24 ms

The last part of configuration is to create L3Out and assign a contract to it.

resource "aci_l3_outside" "L3Out" {
    tenant_dn = aci_tenant.TestTenant.id
    name = "L3Out"
    enforce_rtctrl = ["export", "import"]
    relation_l3ext_rs_ectx = aci_vrf.TestVrf2.id
    relation_l3ext_rs_l3_dom_att = aci_l3_domain_profile.L3Domain.id
}
resource "aci_logical_node_profile" "L3OutNodeProfile" {
    l3_outside_dn = aci_l3_outside.L3Out.id
    name = "L3OutNodeProfile"
}
resource "aci_logical_interface_profile" "L3OutLogicalInterfaceProfile" {
    logical_node_profile_dn = aci_logical_node_profile.L3OutNodeProfile.id
    name = "L3OutLogicalInterfaceProfile"
}
resource "aci_logical_node_to_fabric_node" "NodetoFabric" {
  logical_node_profile_dn = aci_logical_node_profile.L3OutNodeProfile.id
  tdn = "topology/pod-1/node-103"
  rtr_id = "1.1.1.1"
}
resource "aci_l3out_path_attachment" "InterfaceMapping" {
  logical_interface_profile_dn = aci_logical_interface_profile.L3OutLogicalInterfaceProfile.id
  target_dn = "topology/pod-1/paths-103/pathep-[eth1/3]"
  if_inst_t = "l3-port"
  encap = "unknown"
  addr = "192.168.2.254/24"
}
resource "aci_l3out_ospf_external_policy" "L3OutOSPF" {
  l3_outside_dn = aci_l3_outside.L3Out.id
  area_id = "0.0.0.0"
  area_type = "regular"
}
resource "aci_ospf_interface_policy" "L3OutOSPFPolicy" {
    tenant_dn = aci_tenant.TestTenant.id
    name = "L3OutOSPFPolicy"
    ctrl = ["mtu-ignore"]
    dead_intvl = "40"
    hello_intvl = "10"
}
resource "aci_l3out_ospf_interface_profile" "L3OutOSPFInterface" {
  logical_interface_profile_dn = aci_logical_interface_profile.L3OutLogicalInterfaceProfile.id
  relation_ospf_rs_if_pol = aci_ospf_interface_policy.L3OutOSPFPolicy.id
  auth_key = "key"
}
resource "aci_external_network_instance_profile" "Consumer" {
    l3_outside_dn = aci_l3_outside.L3Out.id
    name = "Consumer"
    relation_fv_rs_cons = [aci_contract.TestContract.id]
}
resource "aci_l3_ext_subnet" "ConsumerSubnet" {
  external_network_instance_profile_dn = aci_external_network_instance_profile.Consumer.id
  ip = "2.2.2.2/32"
  scope = ["import-rtctrl", "import-security", "shared-security", "shared-rtctrl"]
}

Let’s configure OSPF on Host to establish adjacency with ACI:

Host# show run vrf Consumer
interface loopback0
  vrf member Consumer
interface Ethernet1/2
  vrf member Consumer
vrf context Consumer
  address-family ipv4 unicast
router ospf 1
  vrf Consumer
Host#
Host# show vrf B interface 
Interface                 VRF-Name                        VRF-ID  Site-of-Origin
loopback0                 Consumer                             4  --
Ethernet1/2               Consumer                             4  --
Host#
Host# show run interface lo0
interface loopback0
  vrf member Consumer
  ip address 2.2.2.2/32
  ip router ospf 1 area 0.0.0.0
Host#
Host# show run interface e1/2
interface Ethernet1/2
  no switchport
  vrf member Consumer
  ip address 192.168.2.1/24
  ip ospf mtu-ignore
  ip router ospf 1 area 0.0.0.0

At this point a contract is applied only to Provider and L3Out so there should be connectivity between them. TestEPG, however, should be unreachable by Provider.

Host# ping 192.168.1.2 vrf Provider
PING 192.168.1.2 (192.168.1.2): 56 data bytes
36 bytes from 192.168.1.1: Destination Host Unreachable
Request 0 timed out
Request 1 timed out
Request 2 timed out
Request 3 timed out
Request 4 timed out
--- 192.168.1.2 ping statistics ---
5 packets transmitted, 0 packets received, 100.00% packet loss
Host#
Host# ping 192.168.1.1 vrf Consumer source 2.2.2.2
PING 192.168.1.1 (192.168.1.1) from 2.2.2.2: 56 data bytes
64 bytes from 192.168.1.1: icmp_seq=0 ttl=252 time=1.691 ms
64 bytes from 192.168.1.1: icmp_seq=1 ttl=252 time=1.489 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=252 time=1.529 ms
64 bytes from 192.168.1.1: icmp_seq=3 ttl=252 time=1.525 ms
64 bytes from 192.168.1.1: icmp_seq=4 ttl=252 time=1.533 ms

In order to reach Provider from the border leaf, there should be a static route to that EPG that lists the necessary VNID rewrite and ClassID.

Leaf-103# show ip route vrf TestTenant:TestVrf2
<output omitted>
1.1.1.1/32, ubest/mbest: 2/0, attached, direct
    *via 1.1.1.1, Lo6, [0/0], 00:08:30, direct
    *via 1.1.1.1, Lo6, [0/0], 00:08:30, local, local
2.2.2.2/32, ubest/mbest: 1/0
    *via 192.168.2.1, Eth1/3, [110/5], 00:07:41, ospf-default, intra
192.168.1.1/32, ubest/mbest: 1/0, attached, direct, pervasive
    *via 10.0.96.64%overlay-1, [1/0], 00:03:54, static, tag 4294967292
192.168.2.0/24, ubest/mbest: 1/0, attached, direct
    *via 192.168.2.254, Eth1/3, [0/0], 00:08:27, direct
192.168.2.254/32, ubest/mbest: 1/0, attached
    *via 192.168.2.254, Eth1/3, [0/0], 00:08:27, local, local
Leaf-103#
Leaf-103# show ip route vrf TestTenant:TestVrf2 192.168.1.1/32 det
<output omitted>
192.168.1.1/32, ubest/mbest: 1/0, attached, direct, pervasive
    *via 10.0.96.64%overlay-1, [1/0], 00:15:41, static, tag 4294967292
         recursive next hop: 10.0.96.64/32%overlay-1
         vrf crossing information:  VNID:0x288000 ClassId:0x1562 Flush#:0x3

As you would expect, 0x288000 (2654208) is the VNID of TestVrf1:

The ClassID 0x1562 (5474) corresponds to Provider EPG:

External EPG on L3Out also has a global pcTag (5475). Remember that a contract is always enforced on a consumer leaf? Well, ingress enforcement of contract (VRF-level knob) mandates applying contracts on a compute leaf instead of a border leaf. In our case the compute leaf is the provider leaf; in order to enforce the policy on its end, the provider leaf has to know L3Out pcTag, thus L3Out EPG must have a global pcTag.

Feeling confused? Cannot figure out where a policy is applied in the end? Let’s see whether border leaf enforces the policies or not:

Leaf-103# show zoning-rule scope 2818048
+---------+--------+--------+----------+----------------+---------+---------+-------------------------+----------+------------------------+
| Rule ID | SrcEPG | DstEPG | FilterID |      Dir       |  operSt |  Scope  |           Name          |  Action  |        Priority        |
+---------+--------+--------+----------+----------------+---------+---------+-------------------------+----------+------------------------+
|   4102  |   0    |   0    | implarp  |    uni-dir     | enabled | 2818048 |                         |  permit  |   any_any_filter(17)   |
|   4099  |   0    |   0    | implicit |    uni-dir     | enabled | 2818048 |                         | deny,log |    any_any_any(21)     |
|   4098  |   0    |   15   | implicit |    uni-dir     | enabled | 2818048 |                         | deny,log |  any_vrf_any_deny(22)  |
|   4108  |  5474  |   0    | implicit |    uni-dir     | enabled | 2818048 |                         | deny,log | shsrc_any_any_deny(12) |
|   4111  |  5474  |  5475  |    4     | uni-dir-ignore | enabled | 2818048 | TestTenant:TestContract |  permit  |     fully_qual(7)      |
|   4110  |  5475  |  5474  |    4     |     bi-dir     | enabled | 2818048 | TestTenant:TestContract |  permit  |     fully_qual(7)      |
+---------+--------+--------+----------+----------------+---------+---------+-------------------------+----------+------------------------+

The rules in this table are responsible for overall filtering within TestVrf2:

  • ID 4102: permits ARP from any to any;
  • ID 4099: denies any traffic from any to any;
  • ID 4098: denies any traffic from any to 0.0.0.0/0 announced by L3Out (added if there is Preferred Group config);
  • ID 4108: denies any traffic from Provider (has global pcTag) to any – always added in consumer VRF to deny traffic that is not covered by a contract (provider VRF just forwards the traffic);
  • ID 4110-4111: permits traffic between Provider and L3Out EPG according to filter 4.

It seems we’re done with the border leaf, let’s jump over to the provider leaf.

Leaf-101# show ip route vrf TestTenant:TestVrf1 
<output omitted>
2.2.2.2/32, ubest/mbest: 1/0
    *via 10.0.88.68%overlay-1, [200/5], 00:20:18, bgp-65000, internal, tag 65000
192.168.1.0/24, ubest/mbest: 1/0, attached, direct, pervasive, dcs
    *via 10.0.96.64%overlay-1, [1/0], 00:16:31, static
192.168.1.1/32, ubest/mbest: 1/0, attached, direct, pervasive, dcs
    *via 10.0.96.64%overlay-1, [1/0], 00:17:58, static
192.168.1.254/32, ubest/mbest: 1/0, attached, pervasive
    *via 192.168.1.254, Vlan4, [0/0], 00:16:31, local, local

Leaf-101#
Leaf-101# show ip route vrf TestTenant:TestVrf1 2.2.2.2/32 det
<output omitted>
2.2.2.2/32, ubest/mbest: 1/0
    *via 10.0.88.68%overlay-1, [200/5], 00:20:28, bgp-65000, internal, tag 65000
         client-specific data: 1d      
         recursive next hop: 10.0.88.68/32%overlay-1
          BGP extended route information: BGP origin AS 65000 BGP peer AS 65000 rw-vnid: 0x2b0000 table-id: 0xe rw-mac: 0

The story is a bit different with compute leaf. External prefixes are exchanged by MP-BGP within the fabric. BGP updates announce the prefixes and corresponding VNIDs so there is no need for static pervasive routes to perform VNID rewrites. ClassID, however, seems to be set statically as there is no relevant information in the BGP output. Besides, pcTag-to-prefix mapping can be obtained by a completely different command:

Leaf-101# show system internal policy-mgr prefix 
Requested prefix data

Vrf-Vni VRF-Id Table-Id Table-State  VRF-Name                    Addr                                Class Shared Remote Complete Svc_ena
======= ======  =========== =======  ============================ ================================= ====== ====== ====== ======== ========
2752512 7      0x7           Up     common:default                                       0.0.0.0/0   15      False  False  False    False   
2752512 7      0x80000007    Up     common:default                                            ::/0   15      False  False  False    False   
2654208 15     0x8000000f    Up     TestTenant:TestVrf1                                       ::/0   15      False  False  False    False   
2654208 15     0xf           Up     TestTenant:TestVrf1                                  0.0.0.0/0   15      False  False  False    False   
2654208 15     0xf           Up     TestTenant:TestVrf1                                  2.2.2.2/32  5475    True   True   False    False   

What about the contracts? Are they applied on provider leaf as well since the global pcTag is allocated for L3Out EPG?

Leaf-101# show zoning-rule scope 2654208
+---------+--------+--------+----------+----------------+---------+---------+-------------------------+-----------------+----------------------+
| Rule ID | SrcEPG | DstEPG | FilterID |      Dir       |  operSt |  Scope  |           Name          |      Action     |       Priority       |
+---------+--------+--------+----------+----------------+---------+---------+-------------------------+-----------------+----------------------+
|   4104  |   0    | 49153  | implicit |    uni-dir     | enabled | 2654208 |                         |      permit     |   any_dest_any(16)   |
|   4101  |   0    |   0    | implarp  |    uni-dir     | enabled | 2654208 |                         |      permit     |  any_any_filter(17)  |
|   4103  |   0    |   0    | implicit |    uni-dir     | enabled | 2654208 |                         |     deny,log    |   any_any_any(21)    |
|   4102  |   0    |   15   | implicit |    uni-dir     | enabled | 2654208 |                         |     deny,log    | any_vrf_any_deny(22) |
|   4113  |  5475  |  5474  |    4     |     bi-dir     | enabled | 2654208 | TestTenant:TestContract |      permit     |    fully_qual(7)     |
|   4115  |  5474  |   14   | implicit |    uni-dir     | enabled | 2654208 |                         | permit_override |    src_dst_any(9)    |
|   4111  |  5474  |  5475  |    4     | uni-dir-ignore | enabled | 2654208 | TestTenant:TestContract |      permit     |    fully_qual(7)     |
+---------+--------+--------+----------+----------------+---------+---------+-------------------------+-----------------+----------------------+

It seems that compute leaf indeed enforces the contact along with border leaf:

  • ID 4104: permits any traffic from any to TestBD1 – flooding within BD;
  • ID 4101: permits ARP from any to any;
  • ID 4103: denies any traffic from any to any;
  • ID 4102: denies any traffic from any to 0.0.0.0/0 announced by L3Out (added if there is Preferred Group config);
  • ID 4115: permits return traffic from Provider back to consumer VRF;
  • ID 4111, 4113: permits traffic between Provider and L3Out EPG according to filter 4.

It doesn’t mean that the policy is enforced twice though. As soon as a policy is applied, SP and DP bits in iVXLAN header are set so there is no double effort. Ambiguity about policy enforcement point – sure, a bit of wasted TCAM – probably, but there should be no double processing involved.

Back to the main topic though. Imagine that TestEPG has to communicate with Provider and there is some kind of restriction that makes contracts not suitable. Preferred Group seems to be the answer since the EPGs do not need a contract to permit traffic between them if they are part of that group. So far we’ve added EPGs to the group but it’s not enabled on VRF level so there is no effect. Let’s enable the feature in GUI as there seems to be no option to do it in Terraform (provider version 2.5.2).

Did it break the connectivity as predicted by white paper?

Host# ping 192.168.1.1 vrf Provider source 2.2.2.2
PING 192.168.1.1 (192.168.1.1) from 2.2.2.2: 56 data bytes
64 bytes from 192.168.1.1: icmp_seq=0 ttl=252 time=1.832 ms
64 bytes from 192.168.1.1: icmp_seq=1 ttl=252 time=1.254 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=252 time=1.285 ms
64 bytes from 192.168.1.1: icmp_seq=3 ttl=252 time=1.529 ms
64 bytes from 192.168.1.1: icmp_seq=4 ttl=252 time=1.579 ms

--- 192.168.1.1 ping statistics ---
5 packets transmitted, 5 packets received, 0.00% packet loss
round-trip min/avg/max = 1.254/1.495/1.832 ms
Host#
Host# ping 192.168.1.254 vrf TestEPG
PING 192.168.1.254 (192.168.1.254): 56 data bytes
64 bytes from 192.168.1.254: icmp_seq=0 ttl=63 time=1.256 ms
64 bytes from 192.168.1.254: icmp_seq=1 ttl=63 time=0.943 ms
64 bytes from 192.168.1.254: icmp_seq=2 ttl=63 time=1.002 ms
64 bytes from 192.168.1.254: icmp_seq=3 ttl=63 time=1.02 ms
64 bytes from 192.168.1.254: icmp_seq=4 ttl=63 time=0.993 ms

--- 192.168.1.254 ping statistics ---
5 packets transmitted, 5 packets received, 0.00% packet loss
round-trip min/avg/max = 0.943/1.042/1.256 ms
Host#
Host# ping 192.168.1.2 vrf Provider
PING 192.168.1.2 (192.168.1.2): 56 data bytes
Request 0 timed out
Request 1 timed out
Request 2 timed out
Request 3 timed out
Request 4 timed out

--- 192.168.1.2 ping statistics ---
5 packets transmitted, 0 packets received, 100.00% packet loss

2.2.2.2/32 still maintains reachability to 192.168.1.1/32, however, Preferred Group has no effect. Let’s remove Provider from the contract:

Host# ping 192.168.1.2 vrf A
PING 192.168.1.2 (192.168.1.2): 56 data bytes
64 bytes from 192.168.1.2: icmp_seq=0 ttl=254 time=1.926 ms
64 bytes from 192.168.1.2: icmp_seq=1 ttl=254 time=1.484 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=254 time=1.248 ms
64 bytes from 192.168.1.2: icmp_seq=3 ttl=254 time=1.272 ms
64 bytes from 192.168.1.2: icmp_seq=4 ttl=254 time=1.521 ms

--- 192.168.1.2 ping statistics ---
5 packets transmitted, 5 packets received, 0.00% packet loss
round-trip min/avg/max = 1.248/1.49/1.926 ms

Doing so seems to enable Preferred Group at the cost of inability to provide a shared contract to L3Out.

+---------+--------+--------+----------+---------+---------+---------+------+----------+----------------------------+
| Rule ID | SrcEPG | DstEPG | FilterID |   Dir   |  operSt |  Scope  | Name |  Action  |          Priority          |
+---------+--------+--------+----------+---------+---------+---------+------+----------+----------------------------+
|   4104  |   0    | 49153  | implicit | uni-dir | enabled | 2654208 |      |  permit  |      any_dest_any(16)      |
|   4101  |   0    |   0    | implarp  | uni-dir | enabled | 2654208 |      |  permit  |     any_any_filter(17)     |
|   4103  |   0    |   0    | implicit | uni-dir | enabled | 2654208 |      |  permit  | grp_any_any_any_permit(20) |
|   4102  |   0    |   15   | implicit | uni-dir | enabled | 2654208 |      | deny,log | grp_any_dest_any_deny(19)  |
|   4114  | 32770  |   0    | implicit | uni-dir | enabled | 2654208 |      | deny,log |  grp_src_any_any_deny(18)  |
+---------+--------+--------+----------+---------+---------+---------+------+----------+----------------------------+

As you can see, there are a few subtle changes in the zoning table. Take a look at rule ID 4103: there is a permit action instead of deny. This is the effect of Preferred Group: traffic permitted by default within VRF. If we had more EPGs that are not part of Preferred Group, their traffic would be explicitly denied. Traffic that enters fabric from L3Out is marked with VRF pcTag; such traffic is not a part of Preferred Group, so it should be dropped as well, hence the rule ID 4114.

Let’s get back to the zoning table that was in effect with the contract still applied moments ago:

+---------+--------+--------+----------+----------------+---------+---------+-------------------------+-----------------+----------------------------+
| Rule ID | SrcEPG | DstEPG | FilterID |      Dir       |  operSt |  Scope  |           Name          |      Action     |          Priority          |
+---------+--------+--------+----------+----------------+---------+---------+-------------------------+-----------------+----------------------------+
|   4104  |   0    | 49153  | implicit |    uni-dir     | enabled | 2654208 |                         |      permit     |      any_dest_any(16)      |
|   4101  |   0    |   0    | implarp  |    uni-dir     | enabled | 2654208 |                         |      permit     |     any_any_filter(17)     |
|   4103  |   0    |   0    | implicit |    uni-dir     | enabled | 2654208 |                         |      permit     | grp_any_any_any_permit(20) |
|   4102  |   0    |   15   | implicit |    uni-dir     | enabled | 2654208 |                         |     deny,log    | grp_any_dest_any_deny(19)  |
|   4113  |  5475  |  5474  |    4     |     bi-dir     | enabled | 2654208 | TestTenant:TestContract |      permit     |       fully_qual(7)        |
|   4115  |  5474  |   14   | implicit |    uni-dir     | enabled | 2654208 |                         | permit_override |       src_dst_any(9)       |
|   4111  |  5474  |  5475  |    4     | uni-dir-ignore | enabled | 2654208 | TestTenant:TestContract |      permit     |       fully_qual(7)        |
|   4112  |  5474  |   0    | implicit |    uni-dir     | enabled | 2654208 |                         |     deny,log    |  grp_src_any_any_deny(18)  |
|   4114  | 32770  |   0    | implicit |    uni-dir     | enabled | 2654208 |                         |     deny,log    |  grp_src_any_any_deny(18)  |
+---------+--------+--------+----------+----------------+---------+---------+-------------------------+-----------------+----------------------------+

If you combine the separate tables when the contract is applied and Preferred group is enabled, you should notice that there is an extra entry – ID 4112. This is the actual culprit: the traffic from Provider to TestEPG matches this entry and gets dropped as a result (this is also noted in defect description). There is a similar entry described in the white paper, however, its priority differs (src_any_any_deny vs grp_src_any_any_deny). So far, I have not managed to find any explanation what this entry actually means or why it is added.

There is almost no practical outcome though: the limitation is clearly defined in the documentation. Complex systems such as ACI should be implemented according to approved guidelines instead of relying on common sense and general knowledge. The only challenge here is to bump into those guidelines that also fit the requirements or read whole documentation thoroughly. However, I hope that I’ve shared enough context around this defect to narrow it down from a mysterious restriction in a white paper to a single line in the zoning table.

Kudos for review: Anastasiia Kuraleva

Follow on Telegram, LinkedIn

ACI VRF leaking

Some people say that BGP is complicated, although I would argue that BGP is relatively straightforward, especially compared to OSPF. However, I have never met anyone who would claim that ACI is easy if marketing is put aside. ACLs or prefix-lists are covered in CCNA track; ACI contracts, however, have a dedicated white paper. One of the biggest mysteries for me was the process to implement inter-VRF contracts. Don’t get me wrong – it’s defined concisely, however, I always had difficulty understanding why those steps are required. Today I’d like to share a few observations on the topic.

The topology is minimal:

Host is a L3 switch that emulates both provider (VRF Provider) and consumer (VRF Consumer) using ACI as a default gateway:

Host# show run vrf Provider
interface Ethernet1/1.100
  vrf member Provider
vrf context Provider
  ip route 0.0.0.0/0 192.168.1.254
  address-family ipv4 unicast
ip route 0.0.0.0/0 192.168.1.254 vrf Provider

Host# show ip interface brief vrf Provider
IP Interface Status for VRF " Provider "(47)
Interface            IP Address      Interface Status
Eth1/1.100           192.168.1.1     protocol-up/link-up/admin-up       

Host# show run vrf Consumer
interface Ethernet1/2.100
  vrf member Consumer
vrf context Consumer
  ip route 0.0.0.0/0 192.168.2.254
  address-family ipv4 unicast
ip route 0.0.0.0/0 192.168.2.254 vrf Consumer

Host# show ip interface brief vrf Consumer
IP Interface Status for VRF " Consumer "(48)
Interface            IP Address      Interface Status
Eth1/2.100           192.168.2.1     protocol-up/link-up/admin-up

As for the ACI, there are just two sets with an EPG, BD, VRF within the same tenant and the relevant infrastructure objects:

Tenant module:

resource "aci_tenant" "TestTenant" {
    name                = "TestTenant"
}
resource "aci_vrf" "TestVrf1" {
    tenant_dn           = aci_tenant.TestTenant.id
    name                = "TestVrf1"
}
resource "aci_vrf" "TestVrf2" {
    tenant_dn           = aci_tenant.TestTenant.id
    name                = "TestVrf2"
}
resource "aci_bridge_domain" "TestBD1" {
    tenant_dn           = aci_tenant.TestTenant.id
    name                = "TestBD1"
    relation_fv_rs_ctx  = aci_vrf.TestVrf1.id
}
resource "aci_subnet" "Subnet1" {
    parent_dn        = aci_application_epg.Provider.id
    ip               = "192.168.1.254/24"
    scope            = ["private", "shared"]
}
resource "aci_bridge_domain" "TestBD2" {
    tenant_dn           = aci_tenant.TestTenant.id
    name                = "TestBD2"
    relation_fv_rs_ctx  = aci_vrf.TestVrf2.id
}
resource "aci_subnet" "Subnet2" {
    parent_dn        = aci_bridge_domain.TestBD2.id
    ip               = "192.168.2.254/24"
    scope            = ["private", "shared"]
}
resource "aci_application_profile" "TestAP" {
    tenant_dn           = aci_tenant.TestTenant.id
    name                = "TestAP"
}
resource "aci_application_epg" "Provider" {
    application_profile_dn  = aci_application_profile.TestAP.id
    name                    = "Provider"
    relation_fv_rs_bd       = aci_bridge_domain.TestBD1.id
}
resource "aci_application_epg" "Consumer" {
    application_profile_dn  = aci_application_profile.TestAP.id
    name                    = "Consumer"
    relation_fv_rs_bd       = aci_bridge_domain.TestBD2.id
}
resource "aci_epg_to_domain" "ProviderDomain" {
    application_epg_dn  = aci_application_epg.Provider.id
    tdn                 = aci_physical_domain.PhysicalDomain.id
}
resource "aci_epg_to_domain" "ConsumerDomain" {
    application_epg_dn  = aci_application_epg.Consumer.id
    tdn                 = aci_physical_domain.PhysicalDomain.id
}

Access Policies module:

resource "aci_vlan_pool" "TestPool" {
  name  = "TestPool"
  alloc_mode  = "static"
}
resource "aci_ranges" "range_1" {
  vlan_pool_dn  = aci_vlan_pool.TestPool.id
  from          = "vlan-1"
  to            = "vlan-1000"
  alloc_mode    = "static"
}
resource "aci_physical_domain" "PhysicalDomain" {
  name                      = "PhysicalDomain"
  relation_infra_rs_vlan_ns = aci_vlan_pool.TestPool.id
}
resource "aci_attachable_access_entity_profile" "TestAAEP" {
    name                    = "TestAAEP"
}
resource "aci_aaep_to_domain" "PhysicalDomain-to-TestAAEP" {
  attachable_access_entity_profile_dn = aci_attachable_access_entity_profile.TestAAEP.id
  domain_dn                           = aci_physical_domain.PhysicalDomain.id
}
resource "aci_leaf_interface_profile" "TestInterfaceProfile" {
    name        = "TestInterfaceProfile"
}
resource "aci_access_port_block" "TestAccessBlockSelector" {
  access_port_selector_dn = aci_access_port_selector.TestAccessPortSelector.id
  name                    = "TestAccessBlockSelector"
  from_card               = "1"
  from_port               = "2"
  to_card                 = "1"
  to_port                 = “2"
}
resource "aci_access_port_selector" "TestAccessPortSelector" {
    leaf_interface_profile_dn       = aci_leaf_interface_profile.TestInterfaceProfile.id
    name                            = "TestAccessPortSelector"
    access_port_selector_type       = "range"
    relation_infra_rs_acc_base_grp  = aci_leaf_access_port_policy_group.TestAccessInterfacePolicy.id
}
resource "aci_leaf_access_port_policy_group" "TestAccessInterfacePolicy" {
    name                        = "TestAccessInterfaceProfile"
    relation_infra_rs_att_ent_p = aci_attachable_access_entity_profile.TestAAEP.id
}
resource "aci_leaf_profile" "TestSwitchProfile" {
  name        = "TestSwitchProfile"
  leaf_selector {
    name                    = "LeafSelector"
    switch_association_type = "range"
    node_block {
      name  = "Block1"
      from_ = "101"
      to_   = "102"
    }
  }
  relation_infra_rs_acc_port_p = [aci_leaf_interface_profile.TestInterfaceProfile.id]
}

Notice that in general the provider subnet has to be defined under the EPG, not BD. Since we use two different EPGs, we have to define a contract between them, although we could keep it as permissive as possible:

Contract module:

resource "aci_application_epg" "Provider" {
    application_profile_dn  = aci_application_profile.TestAP.id
    name                    = " Provider"
    relation_fv_rs_bd       = aci_bridge_domain.TestBD1.id
    relation_fv_rs_prov     = [aci_contract.TestContract.id]
}
resource "aci_application_epg" "Consumer" {
    application_profile_dn  = aci_application_profile.TestAP.id
    name                    = " Consumer"
    relation_fv_rs_bd       = aci_bridge_domain.TestBD2.id
    relation_fv_rs_cons     = [aci_contract.TestContract.id]
}
resource "aci_contract" "TestContract" {
    tenant_dn   =  aci_tenant.TestTenant.id
    name        = "TestContract"
    scope       = "tenant"
}
resource "aci_contract_subject" "TestSubject" {
    contract_dn   = aci_contract.TestContract.id
    name          = "TestSubject"
}
resource "aci_contract_subject_filter" "PermitIPSubj" {
  contract_subject_dn  = aci_contract_subject.TestSubject.id
  filter_dn  = aci_filter.PermitIPFilter.id
}
resource "aci_filter" "PermitIPFilter" {
    tenant_dn   = aci_tenant.TestTenant.id
    name        = "PermitIPFilter"
}
resource "aci_filter_entry" "PermitIPFilterEntry" {
    filter_dn     = aci_filter.PermitIPFilter.id
    name          = "permit_ip "
    ether_t       = "ip"
}

As soon as we deploy this config, Consumer should be able to reach Provider:

Host# traceroute 192.168.1.1 vrf Consumer
traceroute to 192.168.1.1 (192.168.1.1), 30 hops max, 40 byte packets
 1  192.168.2.254 (192.168.2.254)  1.946 ms  0.758 ms  0.691 ms
 2  192.168.1.254 (192.168.1.254)  2.231 ms  0.708 ms  0.705 ms
 3  192.168.1.1 (192.168.1.1)  0.708 ms  0.577 ms  0.578 ms

The setup is correct so we can switch to observations. Why is it necessary to define a subnet under provider EPG instead of relevant BD? There is no similar step in L3VPN inter-VRF leaking configuration so it must be ACI-specific part. Let’s see how the forwarding is done:

leaf-102# show ip route vrf TestTenant:TestVrf2
<output omitted>
192.168.1.0/24, ubest/mbest: 1/0, attached, direct, pervasive
    *via 10.0.88.66%overlay-1, [1/0], 00:07:29, static, tag 4294967294
192.168.2.0/24, ubest/mbest: 1/0, attached, direct, pervasive, dcs
    *via 10.0.88.66%overlay-1, [1/0], 00:11:01, static, tag 4294967294
192.168.2.254/32, ubest/mbest: 1/0, attached, pervasive
    *via 192.168.2.254, Vlan11, [0/0], 00:11:01, local, local
leaf-102#
leaf-102# show ip route vrf TestTenant:TestVrf2 192.168.1.0/24 det
<output omitted>
192.168.1.0/24, ubest/mbest: 1/0, attached, direct, pervasive
    *via 10.0.88.66%overlay-1, [1/0], 00:07:38, static, tag 4294967294
         recursive next hop: 10.0.88.66/32%overlay-1
         vrf crossing information:  VNID:0x238000 ClassId:0x2ab4 Flush#:0x1

Notice that Provider is reachable via static route with a few interesting attributes. First, notice the next-hop – it is the Anycast IP for IPv4 hardware proxy:

spine-201# show ip interface lo9
IP Interface Status for VRF "overlay-1"
lo9, Interface status: protocol-up/link-up/admin-up, iod: 81, mode: anycast-v4
  IP address: 10.0.88.66, IP subnet: 10.0.88.66/32  
  IP broadcast address: 255.255.255.255
  IP primary address route-preference: 0, tag: 0

In order to have the proxy process the packet in the correct VRF, consumer leaf performs VNID rewrite to place the packet into Provider VRF first (0x238000 = 2326528):

Side note: this is the opposite to a regular VXLAN fabric based on NX-OS (excluding downstream VNI feature of course).

Inter-VRF contract is ALWAYS applied on the consumer leaf. However, such a behaviour should break the regular conversation-based forwarding: consumer initiates the flow, so it cannot have received a packet from provider to learn its pcTag. The solution is obvious – the consumer has to know provider pcTag in advance. This is the reason why the subnet has to be configured under provider EPG: as soon as a contract is applied, APIC instructs the consumer leaf to install the static route with VNID rewrite and the provider pcTag, listed in RIB as ClassID (0x2ab4 = 10932):

As a result, consumer leaf has all the necessary information to forward the packet to provider EPG and apply correct policies:

leaf-102# show zoning-rule scope 2719744
+---------+--------+--------+----------+----------------+---------+---------+-------------------------+----------+------------------------+
| Rule ID | SrcEPG | DstEPG | FilterID |      Dir       |  operSt |  Scope  |           Name          |  Action  |        Priority        |
+---------+--------+--------+----------+----------------+---------+---------+-------------------------+----------+------------------------+
|   4101  |   0    |   15   | implicit |    uni-dir     | enabled | 2719744 |                         | deny,log |  any_vrf_any_deny(22)  |
|   4100  |   0    |   0    | implarp  |    uni-dir     | enabled | 2719744 |                         |  permit  |   any_any_filter(17)   |
|   4099  |   0    |   0    | implicit |    uni-dir     | enabled | 2719744 |                         | deny,log |    any_any_any(21)     |
|   4098  |   0    | 49153  | implicit |    uni-dir     | enabled | 2719744 |                         |  permit  |    any_dest_any(16)    |
|   4102  | 10932  | 49154  |    4     | uni-dir-ignore | enabled | 2719744 | TestTenant:TestContract |  permit  |     fully_qual(7)      |
|   4103  | 49154  | 10932  |    4     |     bi-dir     | enabled | 2719744 | TestTenant:TestContract |  permit  |     fully_qual(7)      |
|   4104  | 10932  |   0    | implicit |    uni-dir     | enabled | 2719744 |                         | deny,log | shsrc_any_any_deny(12) |
+---------+--------+--------+----------+----------------+---------+---------+-------------------------+----------+------------------------+

What about the return flow back from provider EPG?

leaf-101# show ip route vrf TestTenant:TestVrf1 192.168.2.0/24 det
<output omitted>
192.168.2.0/24, ubest/mbest: 1/0, attached, direct, pervasive
    *via 10.0.88.66%overlay-1, [1/0], 00:01:13, static, tag 4294967294
         recursive next hop: 10.0.88.66/32%overlay-1
         vrf crossing information:  VNID:0x298000 ClassId:0 Flush#:0

As you could have guessed, there is a corresponding pervasive route back to consumer EPG:

  1. It points to Anycast IPv4 hardware proxy address;
  2. It performs VNID rewrite.

However, ClassID is zero. Does it mean no filtering is done on provider leaf? Indeed:

leaf-101# show zoning-rule scope 2326528
+---------+--------+--------+----------+---------+---------+---------+------+----------+----------------------+
| Rule ID | SrcEPG | DstEPG | FilterID |   Dir   |  operSt |  Scope  | Name |  Action  |       Priority       |
+---------+--------+--------+----------+---------+---------+---------+------+----------+----------------------+
|   4101  |   0    | 16387  | implicit | uni-dir | enabled | 2326528 |      |  permit  |   any_dest_any(16)   |
|   4098  |   0    |   0    | implicit | uni-dir | enabled | 2326528 |      | deny,log |   any_any_any(21)    |
|   4099  |   0    |   0    | implarp  | uni-dir | enabled | 2326528 |      |  permit  |  any_any_filter(17)  |
|   4100  |   0    |   15   | implicit | uni-dir | enabled | 2326528 |      | deny,log | any_vrf_any_deny(22) |
|   4102  | 10932  |   14   | implicit | uni-dir | enabled | 2326528 |      | permit_override |    src_dst_any(9)    |
+---------+--------+--------+----------+---------+---------+---------+------+----------+----------------------+

The non-zero values for pcTag in the zoning table are either reserved values or correspond to BDs:

I’ll leave it up to you to decode the rest of the entries in the zoning table (you might want to check out this section first).

It should be highlighted that inter-VRF traffic disables endpoint learning for both directions. Such an approach ensures that leafs use only pervasive route to forward the traffic so as a result they rewrite VNIDs and apply correct policies. There is subtle implication though: inter-VRF traffic always passes through the spines, even if both provider and consumer are connected to the same leaf.

I hope you see now that ACI is a very complicated system with a lot of inner nuances. It’s not necessarily a bad thing though; after all, computers are way more complex than stone arrows have ever been. However, as an ACI operator, you’d better keep such complexity in mind and stick to approved designs after having thoroughly tested the performance and functionality. Otherwise, you might find yourself in terra incognita and face the grave necessity to redesign your production system from scratch.

Kudos for review: Anastasiia Kuraleva

Follow on Telegram, LinkedIn

Cisco ACI FTAG: trees inside

Unicast forwarding within Cisco ACI fabric is extensively described in various sources, although there is a decent amount of work involved to digest such a volume of information. BUM forwarding in the overlay is also covered, however, there is little information about what is happening in the underlay at the same time. The entity called FTAG is involved: one spine is selected as a root of the tree while all leafs join that tree. There are 12 FTAGs for the purpose of redundancy and load-balancing.

Although the description above seems reasonable, it raises a few questions:

  1. What happens to BUM traffic when an uplink on a leaf fails? Since only one spine participates in the tree, traffic should be dropped – not very resilient, eh?
  2. Consider Remote Leaf scenario: RLs use the anycast address on spines to send BUM traffic. What happens if the traffic lands on the spine that is not participating in that specific FTAG? The same question is valid for Multi-Site architecture as well.

There is a decent session on CiscoLive that would help us cover the points above. The lab setup is relatively simple today:

The only thing we need here is some basic configuration to allow two endpoints in the same EPG to communicate with each other.

Tenant module:

resource "aci_tenant" "TestTenant" {
    name                = "TestTenant"
}
resource "aci_vrf" "TestVrf" {
    tenant_dn           = aci_tenant.TestTenant.id
    name                = "TestVrf"
}
resource "aci_bridge_domain" "TestBD1" {
    tenant_dn           = aci_tenant.TestTenant.id
    name                = "TestBD1"
    relation_fv_rs_ctx  = aci_vrf.TestVrf.id
}
resource "aci_subnet" "Subnet1" {
    parent_dn        = aci_bridge_domain.TestBD1.id
    ip               = "192.168.0.254/24"
    scope            = ["private", "shared"]
}
resource "aci_application_profile" "TestAP" {
    tenant_dn           = aci_tenant.TestTenant.id
    name                = "TestAP"
}
resource "aci_application_epg" "TestEPG1" {
    application_profile_dn  = aci_application_profile.TestAP.id
    name                    = "TestEPG1"
    relation_fv_rs_bd       = aci_bridge_domain.TestBD1.id
}
resource "aci_epg_to_domain" "EPG1Domain" {
    application_epg_dn  = aci_application_epg.TestEPG1.id
    tdn                 = aci_physical_domain.PhysicalDomain.id
}
resource "aci_bulk_epg_to_static_path" "TestEPG1StaticPath" {
  application_epg_dn = aci_application_epg.TestEPG1.id
  static_path {
    interface_dn         = "topology/pod-1/paths-101/pathep-[eth1/2]"
    encap                = "vlan-101"
  }
  static_path {
    interface_dn         = "topology/pod-1/paths-104/pathep-[eth1/2]"
    encap                = "vlan-101"
  }
}

Access policies module:

resource "aci_vlan_pool" "TestPool" {
  name  = "TestPool"
  description = "From Terraform"
  alloc_mode  = "static"
}
resource "aci_ranges" "TestRange" {
  vlan_pool_dn  = aci_vlan_pool.TestPool.id
  from          = "vlan-1"
  to            = "vlan-1000"
  alloc_mode    = "static"
}
resource "aci_physical_domain" "PhysicalDomain" {
  name                      = "PhysicalDomain"
  relation_infra_rs_vlan_ns = aci_vlan_pool.TestPool.id
}
resource "aci_attachable_access_entity_profile" "TestAAEP" {
    name                    = "TestAAEP"
}
resource "aci_aaep_to_domain" "PhysicalDomain-to-TestAAEP" {
  attachable_access_entity_profile_dn = aci_attachable_access_entity_profile.TestAAEP.id
  domain_dn                           = aci_physical_domain.PhysicalDomain.id
}
resource "aci_leaf_interface_profile" "TestInterfaceProfile" {
    name        = "TestInterfaceProfile"
}
resource "aci_access_port_block" "TestAccessBlockSelector" {
  access_port_selector_dn = aci_access_port_selector.TestAccessPortSelector.id
  name                    = "TestAccessBlockSelector"
  from_card               = "1"
  from_port               = "2"
  to_card                 = "1"
  to_port                 = "2"
}
resource "aci_access_port_selector" "TestAccessPortSelector" {
    leaf_interface_profile_dn       = aci_leaf_interface_profile.TestInterfaceProfile.id
    name                            = "TestAccessPortSelector"
    access_port_selector_type       = "range"
    relation_infra_rs_acc_base_grp  = aci_leaf_access_port_policy_group.TestAccessInterfacePolicy.id
}
resource "aci_leaf_access_port_policy_group" "TestAccessInterfacePolicy" {
    name                        = "TestAccessInterfaceProfile"
    relation_infra_rs_att_ent_p = aci_attachable_access_entity_profile.TestAAEP.id
}
resource "aci_leaf_profile" "TestSwitchProfile" {
  name        = "TestSwitchProfile"
  leaf_selector {
    name                    = "LeafSelector"
    switch_association_type = "range"
    node_block {
      name  = "Leaf101"
      from_ = "101"
      to_   = "101"
    }
    node_block {
      name  = "Leaf104"
      from_ = "104"
      to_   = "104"
    }
  }
  relation_infra_rs_acc_port_p = [aci_leaf_interface_profile.TestInterfaceProfile.id]
}

If you look through the session, you’ll notice that FTAG topology looks slightly different: spines connect to the FTAG through a single leaf although they are not root for the FTAG. Remember that FTAG root election is done via IS-IS extension? Here is some CLI output:

Spine202# show isis internal mcast routes ftag
<output omitted>
 FTAG ID:   0 [Root] [Enabled] Cost:(   2/  13/   0)
 ----------------------------------
    Root port: Ethernet1/1.68
    OIF List:

 FTAG ID:   1 [Root] [Enabled] Cost:(   0/   0/   0)
 ----------------------------------
    Root port: -
    OIF List:
      Ethernet1/1.68
      Ethernet1/2.67
      Ethernet1/3.70
      Ethernet1/4.69

 FTAG ID:   2 [Root] [Enabled] Cost:(   0/   0/   0)
 ----------------------------------
    Root port: -
    OIF List:
      Ethernet1/1.68
      Ethernet1/2.67
      Ethernet1/3.70
      Ethernet1/4.69

 FTAG ID:   3 [Root] [Enabled] Cost:(   0/   0/   0)
 ----------------------------------
    Root port: -
    OIF List:
      Ethernet1/1.68
      Ethernet1/2.67
      Ethernet1/3.70
      Ethernet1/4.69

 FTAG ID:   4 [Root] [Enabled] Cost:(   0/   0/   0)
 ----------------------------------
    Root port: -
    OIF List:
      Ethernet1/1.68
      Ethernet1/2.67
      Ethernet1/3.70
      Ethernet1/4.69

 FTAG ID:   5  [Enabled] Cost:(   2/   7/   0)
 ----------------------------------
    Root port: Ethernet1/3.70
    OIF List:

 FTAG ID:   6  [Enabled] Cost:(   2/   8/   0)
 ----------------------------------
    Root port: Ethernet1/2.67
    OIF List:

 FTAG ID:   7  [Enabled] Cost:(   2/   9/   0)
 ----------------------------------
    Root port: Ethernet1/2.67
    OIF List:

 FTAG ID:   8  [Enabled] Cost:(   2/   8/   0)
 ----------------------------------
    Root port: Ethernet1/3.70
    OIF List:

 FTAG ID:   9  [Enabled] Cost:(   2/   7/   0)
 ----------------------------------
    Root port: Ethernet1/4.69
    OIF List:

 FTAG ID:  10  [Enabled] Cost:(   2/  12/   0)
 ----------------------------------
    Root port: Ethernet1/1.68
    OIF List:

 FTAG ID:  11 [Root] [Enabled] Cost:(   0/   0/   0)
 ----------------------------------
    Root port: -
    OIF List:
      Ethernet1/1.68
      Ethernet1/2.67
      Ethernet1/3.70
      Ethernet1/4.69

 FTAG ID:  12 [Root] [Enabled] Cost:(   0/   0/   0)
 ----------------------------------
    Root port: -
    OIF List:
      Ethernet1/1.68
      Ethernet1/2.67
      Ethernet1/3.70
      Ethernet1/4.69
  FTAG ID:  13 [Disabled]
  FTAG ID:  14 [Disabled]
  FTAG ID:  15 [Disabled]
Spine201# show isis internal mcast route ftag
<output omitted>
 FTAG ID:   0 [Root] [Enabled] Cost:(   0/   0/   0)
 ----------------------------------
    Root port: -
    OIF List:
      Ethernet1/1.68
      Ethernet1/2.67
      Ethernet1/3.69
      Ethernet1/4.70

 FTAG ID:   1  [Enabled] Cost:(   2/   8/   0)
 ----------------------------------
    Root port: Ethernet1/2.67
    OIF List:

 FTAG ID:   2  [Enabled] Cost:(   2/   9/   0)
 ----------------------------------
    Root port: Ethernet1/2.67
    OIF List:

 FTAG ID:   3  [Enabled] Cost:(   2/   8/   0)
 ----------------------------------
    Root port: Ethernet1/3.69
    OIF List:

 FTAG ID:   4  [Enabled] Cost:(   2/   8/   0)
 ----------------------------------
    Root port: Ethernet1/4.70
    OIF List:

 FTAG ID:   5 [Root] [Enabled] Cost:(   0/   0/   0)
 ----------------------------------
    Root port: -
    OIF List:
      Ethernet1/1.68
      Ethernet1/2.67
      Ethernet1/3.69
      Ethernet1/4.70

 FTAG ID:   6 [Root] [Enabled] Cost:(   0/   0/   0)
 ----------------------------------
    Root port: -
    OIF List:
      Ethernet1/1.68
      Ethernet1/2.67
      Ethernet1/3.69
      Ethernet1/4.70

 FTAG ID:   7 [Root] [Enabled] Cost:(   0/   0/   0)
 ----------------------------------
    Root port: -
    OIF List:
      Ethernet1/1.68
      Ethernet1/2.67
      Ethernet1/3.69
      Ethernet1/4.70

 FTAG ID:   8 [Root] [Enabled] Cost:(   0/   0/   0)
 ----------------------------------
    Root port: -
    OIF List:
      Ethernet1/1.68
      Ethernet1/2.67
      Ethernet1/3.69
      Ethernet1/4.70

 FTAG ID:   9 [Root] [Enabled] Cost:(   0/   0/   0)
 ----------------------------------
    Root port: -
    OIF List:
      Ethernet1/1.68
      Ethernet1/2.67
      Ethernet1/3.69
      Ethernet1/4.70

 FTAG ID:  10 [Root] [Enabled] Cost:(   0/   0/   0)
 ----------------------------------
    Root port: -
    OIF List:
      Ethernet1/1.68
      Ethernet1/2.67
      Ethernet1/3.69
      Ethernet1/4.70

 FTAG ID:  11  [Enabled] Cost:(   2/  12/   0)
 ----------------------------------
    Root port: Ethernet1/1.68
    OIF List:

 FTAG ID:  12  [Enabled] Cost:(   2/  13/   0)
 ----------------------------------
    Root port: Ethernet1/1.68
    OIF List:
  FTAG ID:  13 [Disabled]
  FTAG ID:  14 [Disabled]
  FTAG ID:  15 [Disabled]

Notice that a non-root spine actually participates in the FTAG anyway (Root port instead of OIL). The GIPo address of the group that corresponds to BD is 225.0.69.80, so it should map to FTAG 0:

Such a topology now gives us a straight answer to the first question: if Leaf104 uplink towards Spine201 fails, it would be able to utilize Spine202 to still forward traffic within FTAG 0:

Spine202# show isis internal mcast routes ftag
<output omitted>
 FTAG ID:   0 [Root][DEFERED] [Enabled] Cost:(   2/  13/   0)
 ----------------------------------
    Root port: Ethernet1/1.68
    OIF List:
      Ethernet1/4.69        <--------- link towards Leaf104
<output omitted>
Leaf104# show isis internal mcast routes ftag
<output omitted>
 FTAG ID:   0  [Enabled] Cost:(   3/  13/   0)
 ----------------------------------
    Root port: Ethernet1/50.12        <--------- link towards Spine202
    OIF List:
<output omitted>

The answer to the second question is also clear: if the traffic lands on a non-root spine, it is just forwarded through the leaf to the FTAG spine.

Spine202# show isis internal mcast route gipo
<output omitted>
 GIPo: 225.0.69.80 [LOCAL]
    OIF List:
      Ethernet1/1.68
      Ethernet1/4.70
      Tunnel4        <--------- Multi-Site tunnel
<output omitted>

There should also be some mechanism to prevent the traffic looping back either to Remote Leafs or a different Site, although I haven’t found any details how exactly it is implemented (my money is on some indicator bit in iVXLAN header).

Does it have any operational impact? I don’t think so, after all, ACI is a solution-grade offering that hides inner complexities from the operator compared to DIY approach. However, I believe it’s a good idea to understand the inner details of the system and be able to stitch together the approaches from different stages of product evolution, at least from the high-level perspective.

Kudos for review: Anastasiia Kuraleva

Follow on Telegram, LinkedIn

Cisco ACI external L2 connectivity using EPGs and L2Out

Today I would like to share my experience with configuring external L2 connectivity in Cisco ACI. As the reader probably knows, there are 2 approaches so far: either consider external L2 segment as an EPG itself or configure an External Bridged Network object (aka L2Out).

The lab has the following setup:

SW1 and SW2 are Nexus 3000 switches that would be used as hosts thus requiring nothing more than routed interface setup. APIC has 4.2(7l) firmware in this lab; specific switch models are irrelevant for the purpose of the discussion. Access policies and related entities are preconfigured since they are out of scope of this article.

Let’s create the network objects required for EPG connectivity: VRF and bridge domain (BD).

Figure 2. VRF setting – everything left as default

Figure 3. First step of BD config – all is default as well

Since we don’t require L3 connectivity, we could disable unicast routing within BD as well as uRPF. As for the other options, they could be left as default:

Figure 4. BD config, step 2
Figure 5. BD config, step 3

Now we can create an application profile (AP), TestAP, and get down to enabling L2 connectivity using the first method: assigning ports to EPGs SW1, SW2 and creating a contract.

Figure 6. Example of EPG config – SW1. Everything is default

It’s also important not to forget to associate the EPGs with domains, otherwise the policy would not be deployed. Let’s assign the EPG to the corresponding physical ports:

Figure 7. Static assignment of physical port to EPG SW1

Having done everything (almost) on the ACI side, we can switch over to configuring N3k:

SW1(config)# interface ethernet1/46
SW1(config-if)# shutdown
SW1(config-if)# no switchport
SW1(config-if)# ip add 192.168.0.1/24
SW1(config-if)# mac-address 0000.0000.0001
SW2(config)# interface ethernet1/46
SW2(config-if)# shutdown
SW2(config-if)# no switchport
SW2(config-if)# ip add 192.168.0.2/24
SW2(config-if)# mac-address 0000.0000.0002

MAC addresses are assigned specific values so that it would be easier to distinguish the endpoint entries. IP addresses cannot be used for this purpose in our setup because we have disabled unicast routing on BD plus traffic is switched rather than routed.

This is still not enough to enable connectivity between SW1 and SW2 though because ACI employs whitelisting model via contracts. Let’s configure bidirectional ICMP connectivity:

Figure 8. Contract filter
Figure 9. Contract subject
Figure 10. Contract itself

Notice the scope of the contract: it would allow to confine connectivity within the AP instead of VRF. Although not relevant for SW1-SW2 reachability in this specific case, restricting the scope in general helps to avoid unexpected connectivity between endpoints in the same VRF. The only thing left is to assign the contract to the corresponding EPGs:

Figure 11. Assign contract to an EPG
Figure 12. Contract topology with external L2 segment as EPG

Time to test the connectivity between switches SW1 and SW2:

SW1# ping 192.168.0.2
PING 192.168.0.2 (192.168.0.2): 56 data bytes
36 bytes from 192.168.0.1: Destination Host Unreachable
Request 0 timed out
36 bytes from 192.168.0.1: Destination Host Unreachable
Request 1 timed out
64 bytes from 192.168.0.2: icmp_seq=2 ttl=254 time=2.259 ms
64 bytes from 192.168.0.2: icmp_seq=3 ttl=254 time=2.061 ms
64 bytes from 192.168.0.2: icmp_seq=4 ttl=254 time=2.138 ms

The process is rather simple, isn’t it? Configuring L2Out instead of EPG SW2 would require much fewer steps now since most of the config is already in place. First, we need to remove static assignment from EPG SW2. Second, we’ll use this free port for L2Out:

Figure 13. L2Out config, step 1

I didn’t manage to configure L2Out to accept frames in VLAN 1, either tagged or untagged. When VLAN 1 was configured on L2Out, no endpoints were learned whatsoever. Since no faults were raised, I’m not sure whether there is any limitation regarding using VLAN 1 for L2Out. Anyway, I switched to SVI plus trunk configuration on SW2:

SW2(config)# interface ethernet1/46
SW2(config-if)# switchport
SW2(config-if)# switchport mode trunk
SW2(config-vlan)# interface vlan 150
SW2(config-if)# mac-address 0000.0000.0002
SW2(config-if)# ip address 192.168.0.2/24
SW2(config-if)# no shutdown

At the end, EPG should be assigned to L2 External Domain, otherwise the corresponding policies would not be deployed.

Since connectivity is governed by contracts, L2Out endpoints should be classified to external EPG:

Figure 14. L2Out config, step 2
Figure 15. L2Out config, step 2.5

EPG SW1 has already been configured, so only external EPG SW2 has to be assigned the contract to allow connectivity:

Figure 16. L2Out external EPG contract assignment
Figure 17. Contract topology with external L2 segment as L2Out

So far so good, everything seems to be in place, so let’s test connectivity once more:

SW1# ping 192.168.0.2
PING 192.168.0.2 (192.168.0.2): 56 data bytes
Request 0 timed out
Request 1 timed out
Request 2 timed out
Request 3 timed out
Request 4 timed out

Well, something is obviously broken. One way to check if there’s something wrong with the contract is to disable policy enforcement in VRF. Once this is done, connectivity should be checked again. If a contract is the issue, then the ping should go through because contract enforcement is disabled. If SW1 cannot reach SW2 even with policy enforcement disabled, then something is wrong with L2Out configuration.

SW1# ping 192.168.0.2
PING 192.168.0.2 (192.168.0.2): 56 data bytes
64 bytes from 192.168.0.2: icmp_seq=0 ttl=254 time=1.713 ms
64 bytes from 192.168.0.2: icmp_seq=1 ttl=254 time=1.457 ms
64 bytes from 192.168.0.2: icmp_seq=2 ttl=254 time=1.397 ms
64 bytes from 192.168.0.2: icmp_seq=3 ttl=254 time=1.415 ms
64 bytes from 192.168.0.2: icmp_seq=4 ttl=254 time=1.548 ms

It is definitely the policy who muddies the waters. However, everything was fine when we configured EPG in lieu of L2Out, so there must be no mistake in filtering, right? As always, the devil is in details. The contract is the issue, sure, but the filters at the same time are fine as well as the contract application. Looking back, I think there is only one place left where something can go wrong with such a config; however, it took me several hours to start desperately changing all the options available in tenant trying to find out what exactly blocks the connectivity. Funny enough, no faults were raised within the tenant meaning there were no contradictions within object model.

Remember the scope of the contract? We set it as “Application Profile” initially, but what AP does L2Out belong to? I don’t have an answer for that, unfortunately; however, you must be getting a feeling that it’s not natural for L2Out to have a contract with such a scope, so let’s change it to something bigger, “VRF”, for instance, and verify the connectivity again:

SW1# ping 192.168.0.2
PING 192.168.0.2 (192.168.0.2): 56 data bytes
64 bytes from 192.168.0.2: icmp_seq=0 ttl=254 time=1.807 ms
64 bytes from 192.168.0.2: icmp_seq=1 ttl=254 time=1.528 ms
64 bytes from 192.168.0.2: icmp_seq=2 ttl=254 time=1.412 ms
64 bytes from 192.168.0.2: icmp_seq=3 ttl=254 time=1.446 ms
64 bytes from 192.168.0.2: icmp_seq=4 ttl=254 time=1.391 ms

At last, we managed to make L2Out work as intended. Was it easier than configuring L2 connectivity via EPG? Not quite: additional steps were needed to provision such an object (separate domain, EPG domain assignment) and the result turned out to be more rigid (single VLAN per L2Out as a whole). I also had hard time finding any documentation regarding L2Out, mostly bumping into various blogposts or forums. There was a positive point though: I’ve come across an interesting book that might prove itself useful for ACI troubleshooting.

You might be asking yourself: what about L3Out? As you could guess, it is subject to the same behavior as L2Out: contract with the scope of ”Application profile” is not going to be rendered for L3Out as well.

Kudos for review: Anastasiia Kuraleva

Follow on Telegram, LinkedIn