/share/doc/IPv6/IMPLEMENTATION
https://bitbucket.org/freebsd/freebsd-head/ · #! · 2392 lines · 2005 code · 387 blank · 0 comment · 0 complexity · 0bd9aeb31d0e58ddba98bbdb1a5ebaa7 MD5 · raw file
Large files are truncated click here to view the full file
- Implementation Note
- KAME Project
- http://www.kame.net/
- $KAME: IMPLEMENTATION,v 1.216 2001/05/25 07:43:01 jinmei Exp $
- $FreeBSD$
- NOTE: The document tries to describe behaviors/implementation choices
- of the latest KAME/*BSD stack. The description here may not be
- applicable to KAME-integrated *BSD releases, as we have certain amount
- of changes between them. Still, some of the content can be useful for
- KAME-integrated *BSD releases.
- Table of Contents
- 1. IPv6
- 1.1 Conformance
- 1.2 Neighbor Discovery
- 1.3 Scope Zone Index
- 1.3.1 Kernel internal
- 1.3.2 Interaction with API
- 1.3.3 Interaction with users (command line)
- 1.4 Plug and Play
- 1.4.1 Assignment of link-local, and special addresses
- 1.4.2 Stateless address autoconfiguration on hosts
- 1.4.3 DHCPv6
- 1.5 Generic tunnel interface
- 1.6 Address Selection
- 1.6.1 Source Address Selection
- 1.6.2 Destination Address Ordering
- 1.7 Jumbo Payload
- 1.8 Loop prevention in header processing
- 1.9 ICMPv6
- 1.10 Applications
- 1.11 Kernel Internals
- 1.12 IPv4 mapped address and IPv6 wildcard socket
- 1.12.1 KAME/BSDI3 and KAME/FreeBSD228
- 1.12.2 KAME/FreeBSD[34]x
- 1.12.2.1 KAME/FreeBSD[34]x, listening side
- 1.12.2.2 KAME/FreeBSD[34]x, initiating side
- 1.12.3 KAME/NetBSD
- 1.12.3.1 KAME/NetBSD, listening side
- 1.12.3.2 KAME/NetBSD, initiating side
- 1.12.4 KAME/BSDI4
- 1.12.4.1 KAME/BSDI4, listening side
- 1.12.4.2 KAME/BSDI4, initiating side
- 1.12.5 KAME/OpenBSD
- 1.12.5.1 KAME/OpenBSD, listening side
- 1.12.5.2 KAME/OpenBSD, initiating side
- 1.12.6 More issues
- 1.12.7 Interaction with SIIT translator
- 1.13 sockaddr_storage
- 1.14 Invalid addresses on the wire
- 1.15 Node's required addresses
- 1.15.1 Host case
- 1.15.2 Router case
- 1.16 Advanced API
- 1.17 DNS resolver
- 2. Network Drivers
- 2.1 FreeBSD 2.2.x-RELEASE
- 2.2 BSD/OS 3.x
- 2.3 NetBSD
- 2.4 FreeBSD 3.x-RELEASE
- 2.5 FreeBSD 4.x-RELEASE
- 2.6 OpenBSD 2.x
- 2.7 BSD/OS 4.x
- 3. Translator
- 3.1 FAITH TCP relay translator
- 3.2 IPv6-to-IPv4 header translator
- 4. IPsec
- 4.1 Policy Management
- 4.2 Key Management
- 4.3 AH and ESP handling
- 4.4 IPComp handling
- 4.5 Conformance to RFCs and IDs
- 4.6 ECN consideration on IPsec tunnels
- 4.7 Interoperability
- 4.8 Operations with IPsec tunnel mode
- 4.8.1 RFC2401 IPsec tunnel mode approach
- 4.8.2 draft-touch-ipsec-vpn approach
- 5. ALTQ
- 6. Mobile IPv6
- 6.1 KAME node as correspondent node
- 6.2 KAME node as home agent/mobile node
- 6.3 Old Mobile IPv6 code
- 7. Coding style
- 8. Policy on technology with intellectual property right restriction
- 1. IPv6
- 1.1 Conformance
- The KAME kit conforms, or tries to conform, to the latest set of IPv6
- specifications. For future reference we list some of the relevant documents
- below (NOTE: this is not a complete list - this is too hard to maintain...).
- For details please refer to specific chapter in the document, RFCs, manpages
- come with KAME, or comments in the source code.
- Conformance tests have been performed on past and latest KAME STABLE kit,
- at TAHI project. Results can be viewed at http://www.tahi.org/report/KAME/.
- We also attended Univ. of New Hampshire IOL tests (http://www.iol.unh.edu/)
- in the past, with our past snapshots.
- RFC1639: FTP Operation Over Big Address Records (FOOBAR)
- * RFC2428 is preferred over RFC1639. ftp clients will first try RFC2428,
- then RFC1639 if failed.
- RFC1886: DNS Extensions to support IPv6
- RFC1933: (see RFC2893)
- RFC1981: Path MTU Discovery for IPv6
- RFC2080: RIPng for IPv6
- * KAME-supplied route6d, bgpd and hroute6d support this.
- RFC2283: Multiprotocol Extensions for BGP-4
- * so-called "BGP4+".
- * KAME-supplied bgpd supports this.
- RFC2292: Advanced Sockets API for IPv6
- * see RFC3542
- RFC2362: Protocol Independent Multicast-Sparse Mode (PIM-SM)
- * RFC2362 defines the packet formats and the protcol of PIM-SM.
- RFC2373: IPv6 Addressing Architecture
- * KAME supports node required addresses, and conforms to the scope
- requirement.
- RFC2374: An IPv6 Aggregatable Global Unicast Address Format
- * KAME supports 64-bit length of Interface ID.
- RFC2375: IPv6 Multicast Address Assignments
- * Userland applications use the well-known addresses assigned in the RFC.
- RFC2428: FTP Extensions for IPv6 and NATs
- * RFC2428 is preferred over RFC1639. ftp clients will first try RFC2428,
- then RFC1639 if failed.
- RFC2460: IPv6 specification
- RFC2461: Neighbor discovery for IPv6
- * See 1.2 in this document for details.
- RFC2462: IPv6 Stateless Address Autoconfiguration
- * See 1.4 in this document for details.
- RFC2463: ICMPv6 for IPv6 specification
- * See 1.9 in this document for details.
- RFC2464: Transmission of IPv6 Packets over Ethernet Networks
- RFC2465: MIB for IPv6: Textual Conventions and General Group
- * Necessary statistics are gathered by the kernel. Actual IPv6 MIB
- support is provided as patchkit for ucd-snmp.
- RFC2466: MIB for IPv6: ICMPv6 group
- * Necessary statistics are gathered by the kernel. Actual IPv6 MIB
- support is provided as patchkit for ucd-snmp.
- RFC2467: Transmission of IPv6 Packets over FDDI Networks
- RFC2472: IPv6 over PPP
- RFC2492: IPv6 over ATM Networks
- * only PVC is supported.
- RFC2497: Transmission of IPv6 packet over ARCnet Networks
- RFC2545: Use of BGP-4 Multiprotocol Extensions for IPv6 Inter-Domain Routing
- RFC2553: (see RFC3493)
- RFC2671: Extension Mechanisms for DNS (EDNS0)
- * see USAGE for how to use it.
- * not supported on kame/freebsd4 and kame/bsdi4.
- RFC2673: Binary Labels in the Domain Name System
- * KAME/bsdi4 supports A6, DNAME and binary label to some extent.
- * KAME apps/bind8 repository has resolver library with partial A6, DNAME
- and binary label support.
- RFC2675: IPv6 Jumbograms
- * See 1.7 in this document for details.
- RFC2710: Multicast Listener Discovery for IPv6
- RFC2711: IPv6 router alert option
- RFC2732: Format for Literal IPv6 Addresses in URL's
- * The spec is implemented in programs that handle URLs
- (like freebsd ftpio(3) and fetch(1), or netbsd ftp(1))
- RFC2874: DNS Extensions to Support IPv6 Address Aggregation and Renumbering
- * KAME/bsdi4 supports A6, DNAME and binary label to some extent.
- * KAME apps/bind8 repository has resolver library with partial A6, DNAME
- and binary label support.
- RFC2893: Transition Mechanisms for IPv6 Hosts and Routers
- * IPv4 compatible address is not supported.
- * automatic tunneling (4.3) is not supported.
- * "gif" interface implements IPv[46]-over-IPv[46] tunnel in a generic way,
- and it covers "configured tunnel" described in the spec.
- See 1.5 in this document for details.
- RFC2894: Router renumbering for IPv6
- RFC3041: Privacy Extensions for Stateless Address Autoconfiguration in IPv6
- RFC3056: Connection of IPv6 Domains via IPv4 Clouds
- * So-called "6to4".
- * "stf" interface implements it. Be sure to read
- draft-itojun-ipv6-transition-abuse-01.txt
- below before configuring it, there can be security issues.
- RFC3142: An IPv6-to-IPv4 transport relay translator
- * FAITH tcp relay translator (faithd) implements this. See 3.1 for more
- details.
- RFC3152: Delegation of IP6.ARPA
- * libinet6 resolvers contained in the KAME snaps support to use
- the ip6.arpa domain (with the nibble format) for IPv6 reverse
- lookups.
- RFC3484: Default Address Selection for IPv6
- * the selection algorithm for both source and destination addresses
- is implemented based on the RFC, though some rules are still omitted.
- RFC3493: Basic Socket Interface Extensions for IPv6
- * IPv4 mapped address (3.7) and special behavior of IPv6 wildcard bind
- socket (3.8) are,
- - supported and turned on by default on KAME/FreeBSD[34]
- and KAME/BSDI4,
- - supported but turned off by default on KAME/NetBSD and KAME/FreeBSD5,
- - not supported on KAME/FreeBSD228, KAME/OpenBSD and KAME/BSDI3.
- see 1.12 in this document for details.
- * The AI_ALL and AI_V4MAPPED flags are not supported.
- RFC3542: Advanced Sockets API for IPv6 (revised)
- * For supported library functions/kernel APIs, see sys/netinet6/ADVAPI.
- * Some of the updates in the draft are not implemented yet. See
- TODO.2292bis for more details.
- RFC4007: IPv6 Scoped Address Architecture
- * some part of the documentation (especially about the routing
- model) is not supported yet.
- * zone indices that contain scope types have not been supported yet.
- draft-ietf-ipngwg-icmp-name-lookups-09: IPv6 Name Lookups Through ICMP
- draft-ietf-ipv6-router-selection-07.txt:
- Default Router Preferences and More-Specific Routes
- * router-side: both router preference and specific routes are supported.
- * host-side: only router preference is supported.
- draft-ietf-pim-sm-v2-new-02.txt
- A revised version of RFC2362, which includes the IPv6 specific
- packet format and protocol descriptions.
- draft-ietf-dnsext-mdns-00.txt: Multicast DNS
- * kame/mdnsd has test implementation, which will not be built in
- default compilation. The draft will experience a major change in the
- near future, so don't rely upon it.
- draft-ietf-ipngwg-icmp-v3-02.txt: ICMPv6 for IPv6 specification (revised)
- * See 1.9 in this document for details.
- draft-itojun-ipv6-tcp-to-anycast-01.txt:
- Disconnecting TCP connection toward IPv6 anycast address
- draft-ietf-ipv6-rfc2462bis-06.txt: IPv6 Stateless Address
- Autoconfiguration (revised)
- draft-itojun-ipv6-transition-abuse-01.txt:
- Possible abuse against IPv6 transition technologies (expired)
- * KAME does not implement RFC1933/2893 automatic tunnel.
- * "stf" interface implements some address filters. Refer to stf(4)
- for details. Since there's no way to make 6to4 interface 100% secure,
- we do not include "stf" interface into GENERIC.v6 compilation.
- * kame/openbsd completely disables IPv4 mapped address support.
- * kame/netbsd makes IPv4 mapped address support off by default.
- * See section 1.12.6 and 1.14 for more details.
- draft-itojun-ipv6-flowlabel-api-01.txt: Socket API for IPv6 flow label field
- * no consideration is made against the use of routing headers and such.
- 1.2 Neighbor Discovery
- Our implementation of Neighbor Discovery is fairly stable. Currently
- Address Resolution, Duplicated Address Detection, and Neighbor
- Unreachability Detection are supported. In the near future we will be
- adding an Unsolicited Neighbor Advertisement transmission command as
- an administration tool.
- Duplicated Address Detection (DAD) will be performed when an IPv6 address
- is assigned to a network interface, or the network interface is enabled
- (ifconfig up). It is documented in RFC2462 5.4.
- If DAD fails, the address will be marked "duplicated" and message will be
- generated to syslog (and usually to console). The "duplicated" mark
- can be checked with ifconfig. It is administrators' responsibility to check
- for and recover from DAD failures. We may try to improve failure recovery
- in future KAME code.
- A successor version of RFC2462 (called rfc2462bis) clarifies the
- behavior when DAD fails (i.e., duplicate is detected): if the
- duplicate address is a link-local address formed from an interface
- identifier based on the hardware address which is supposed to be
- uniquely assigned (e.g., EUI-64 for an Ethernet interface), IPv6
- operation on the interface should be disabled. The KAME
- implementation supports this as follows: if this type of duplicate is
- detected, the kernel marks "disabled" in the ND specific data
- structure for the interface. Every IPv6 I/O operation in the kernel
- checks this mark, and the kernel will drop packets received on or
- being sent to the "disabled" interface. Whether the IPv6 operation is
- disabled or not can be confirmed by the ndp(8) command. See the man
- page for more details.
- DAD procedure may not be effective on certain network interfaces/drivers.
- If a network driver needs long initialization time (with wireless network
- interfaces this situation is popular), and the driver mistakingly raises
- IFF_RUNNING before the driver becomes ready, DAD code will try to transmit
- DAD probes to not-really-ready network driver and the packet will not go out
- from the interface. In such cases, network drivers should be corrected.
- Some of network drivers loop multicast packets back to themselves,
- even if instructed not to do so (especially in promiscuous mode). In
- such cases DAD may fail, because the DAD engine sees inbound NS packet
- (actually from the node itself) and considers it as a sign of
- duplicate. In this case, drivers should be corrected to honor
- IFF_SIMPLEX behavior. For example, you may need to check source MAC
- address on an inbound packet, and reject it if it is from the node
- itself.
- Neighbor Discovery specification (RFC2461) does not talk about neighbor
- cache handling in the following cases:
- (1) when there was no neighbor cache entry, node received unsolicited
- RS/NS/NA/redirect packet without link-layer address
- (2) neighbor cache handling on medium without link-layer address
- (we need a neighbor cache entry for IsRouter bit)
- For (1), we implemented workaround based on discussions on IETF ipngwg mailing
- list. For more details, see the comments in the source code and email
- thread started from (IPng 7155), dated Feb 6 1999.
- IPv6 on-link determination rule (RFC2461) is quite different from
- assumptions in BSD IPv4 network code. To implement the behavior in
- RFC2461 section 6.3.6 (3), the kernel needs to know the default
- outgoing interface. To configure the default outgoing interface, use
- commands like "ndp -I de0" as root. Then the kernel will have a
- "default" route to the interface with the cloning "C" bit being on.
- This default route will cause to make a neighbor cache entry for every
- destination that does not match an explicit route entry.
- Note that we intentionally disable configuring the default interface
- by default. This is because we found it sometimes caused inconvenient
- situation while it was rarely useful in practical usage. For example,
- consider a destination that has both IPv4 and IPv6 addresses but is
- only reachable via IPv4. Since our getaddrinfo(3) prefers IPv6 by
- default, an (TCP) application using the library with PF_UNSPEC first
- tries to connect to the IPv6 address. If we turn on RFC 2461 6.3.6
- (3), we have to wait for quite a long period before the first attempt
- to make a connection fails. If we turn it off, the first attempt will
- immediately fail with EHOSTUNREACH, and then the application can try
- the next, reachable address.
- The notion of the default interface is also disabled when the node is
- acting as a router. The reason is that routers tend to control all
- routes stored in the kernel and the default route automatically
- installed would rather confuse the routers. Note that the spec misuse
- the word "host" and "node" in several places in Section 5.2 of RFC
- 2461. We basically read the word "node" in this section as "host,"
- and thus believe the implementation policy does not break the
- specification.
- To avoid possible DoS attacks and infinite loops, KAME stack will accept
- only 10 options on ND packet. Therefore, if you have 20 prefix options
- attached to RA, only the first 10 prefixes will be recognized.
- If this troubles you, please contact the KAME team and/or modify
- nd6_maxndopt in sys/netinet6/nd6.c. If there are high demands we may
- provide a sysctl knob for the variable.
- Proxy Neighbor Advertisement support is implemented in the kernel.
- For instance, you can configure it by using the following command:
- # ndp -s fe80::1234%ne0 0:1:2:3:4:5 proxy
- where ne0 is the interface which attaches to the same link as the
- proxy target.
- There are certain limitations, though:
- - It does not send unsolicited multicast NA on configuration. This is MAY
- behavior in RFC2461.
- - It does not add random delay before transmission of solicited NA. This is
- SHOULD behavior in RFC2461.
- - We cannot configure proxy NDP for off-link address. The target address for
- proxying must be link-local address, or must be in prefixes configured to
- node which does proxy NDP.
- - RFC2461 is unclear about if it is legal for a host to perform proxy ND.
- We do not prohibit hosts from doing proxy ND, but there will be very limited
- use in it.
- Starting mid March 2000, we support Neighbor Unreachability Detection
- (NUD) on p2p interfaces, including tunnel interfaces (gif). NUD is
- turned on by default. Before March 2000 the KAME stack did not
- perform NUD on p2p interfaces. If the change raises any
- interoperability issues, you can turn off/on NUD by per-interface
- basis. Use "ndp -i interface -nud" to turn it off. Consult ndp(8)
- for details.
- RFC2461 specifies upper-layer reachability confirmation hint. Whenever
- upper-layer reachability confirmation hint comes, ND process can use it
- to optimize neighbor discovery process - ND process can omit real ND exchange
- and keep the neighbor cache state in REACHABLE.
- We currently have two sources for hints: (1) setsockopt(IPV6_REACHCONF)
- defined by the RFC3542 API, and (2) hints from tcp(6)_input.
- It is questionable if they are really trustworthy. For example, a
- rogue userland program can use IPV6_REACHCONF to confuse the ND
- process. Neighbor cache is a system-wide information pool, and it is
- bad to allow a single process to affect others. Also, tcp(6)_input
- can be hosed by hijack attempts. It is wrong to allow hijack attempts
- to affect the ND process.
- Starting June 2000, the ND code has a protection mechanism against
- incorrect upper-layer reachability confirmation. The ND code counts
- subsequent upper-layer hints. If the number of hints reaches the
- maximum, the ND code will ignore further upper-layer hints and run
- real ND process to confirm reachability to the peer. sysctl
- net.inet6.icmp6.nd6_maxnudhint defines the maximum # of subsequent
- upper-layer hints to be accepted.
- (from April 2000 to June 2000, we rejected setsockopt(IPV6_REACHCONF) from
- non-root process - after a local discussion, it looks that hints are not
- that trustworthy even if they are from privileged processes)
- If inbound ND packets carry invalid values, the KAME kernel will
- drop these packet and increment statistics variable. See
- "netstat -sn", icmp6 section. For detailed debugging session, you can
- turn on syslog output from the kernel on errors, by turning on sysctl MIB
- net.inet6.icmp6.nd6_debug. nd6_debug can be turned on at bootstrap
- time, by defining ND6_DEBUG kernel compilation option (so you can
- debug behavior during bootstrap). nd6_debug configuration should
- only be used for test/debug purposes - for a production environment,
- nd6_debug must be set to 0. If you leave it to 1, malicious parties
- can inject broken packet and fill up /var/log partition.
- 1.3 Scope Zone Index
- IPv6 uses scoped addresses. It is therefore very important to
- specify the scope zone index (link index for a link-local address, or
- site index for a site-local address) with an IPv6 address. Without a
- zone index, a scoped IPv6 address is ambiguous to the kernel, and
- the kernel would not be able to determine the outbound zone for a
- packet to the scoped address. KAME code tries to address the issue in
- several ways.
- The entire architecture of scoped addresses is documented in RFC4007.
- One non-trivial point of the architecture is that the link scope is
- (theoretically) larger than the interface scope. That is, two
- different interfaces can belong to a same single link. However, in a
- normal operation, we can assume that there is 1-to-1 relationship
- between links and interfaces. In other words, we can usually put
- links and interfaces in the same scope type. The current KAME
- implementation assumes the 1-to-1 relationship. In particular, we use
- interface names such as "ne1" as unique link identifiers. This would
- be much more human-readable and intuitive than numeric identifiers,
- but please keep your mind on the theoretical difference between links
- and interfaces.
- Site-local addresses are very vaguely defined in the specs, and both
- the specification and the KAME code need tons of improvements to
- enable its actual use. For example, it is still very unclear how we
- define a site, or how we resolve host names in a site. There is work
- underway to define behavior of routers at site border, but, we have
- almost no code for site boundary node support (neither forwarding nor
- routing) and we bet almost noone has. We recommend, at this moment,
- you to use global addresses for experiments - there are way too many
- pitfalls if you use site-local addresses.
- 1.3.1 Kernel internal
- In the kernel, the link index for a link-local scope address is
- embedded into the 2nd 16bit-word (the 3rd and 4th bytes) in the IPv6
- address.
- For example, you may see something like:
- fe80:1::200:f8ff:fe01:6317
- in the routing table and the interface address structure (struct
- in6_ifaddr). The address above is a link-local unicast address which
- belongs to a network link whose link identifier is 1 (note that it
- eqauls to the interface index by the assumption of our
- implementation). The embedded index enables us to identify IPv6
- link-local addresses over multiple links effectively and with only a
- little code change.
- The use of the internal format must be limited inside the kernel. In
- particular, addresses sent by an application should not contain the
- embedded index (except via some very special APIs such as routing
- sockets). Instead, the index should be specified in the sin6_scope_id
- field of a sockaddr_in6 structure. Obviously, packets sent to or
- received from must not contain the embedded index either, since the
- index is meaningful only within the sending/receiving node.
- In order to deal with the differences, several kernel routines are
- provided. These are available by including <netinet6/scope_var.h>.
- Typically, the following functions will be most generally used:
- - int sa6_embedscope(struct sockaddr_in6 *sa6, int defaultok);
- Embed sa6->sin6_scope_id into sa6->sin6_addr. If sin6_scope_id is
- 0, defaultok is non-0, and the default zone ID (see RFC4007) is
- configured, the default ID will be used instead of the value of the
- sin6_scope_id field. On success, sa6->sin6_scope_id will be reset
- to 0.
- This function returns 0 on success, or a non-0 error code otherwise.
-
- - int sa6_recoverscope(struct sockaddr_in6 *sa6);
- Extract embedded zone ID in sa6->sin6_addr and set
- sa6->sin6_scope_id to that ID. The embedded ID will be cleared with
- 0.
- This function returns 0 on success, or a non-0 error code otherwise.
- - int in6_clearscope(struct in6_addr *in6);
- Reset the embedded zone ID in 'in6' to 0. This function never fails, and
- returns 0 if the original address is intact or non 0 if the address is
- modified. The return value doesn't matter in most cases; currently, the
- only point where we care about the return value is ip6_input() for checking
- whether the source or destination addresses of the incoming packet is in
- the embedded form.
- - int in6_setscope(struct in6_addr *in6, struct ifnet *ifp,
- u_int32_t *zoneidp);
- Embed zone ID determined by the address scope type for 'in6' and the
- interface 'ifp' into 'in6'. If zoneidp is non NULL, *zoneidp will
- also have the zone ID.
- This function returns 0 on success, or a non-0 error code otherwise.
- The typical usage of these functions is as follows:
- sa6_embedscope() will be used at the socket or transport layer to
- convert a sockaddr_in6 structure passed by an application into the
- kernel-internal form. In this usage, the second argument is often the
- 'ip6_use_defzone' global variable.
- sa6_recoverscope() will also be used at the socket or transport layer
- to convert an in6_addr structure with the embedded zone ID into a
- sockaddr_in6 structure with the corresponding ID in the sin6_scope_id
- field (and without the embedded ID in sin6_addr).
- in6_clearscope() will be used just before sending a packet to the wire
- to remove the embedded ID. In general, this must be done at the last
- stage of an output path, since otherwise the address would lose the ID
- and could be ambiguous with regard to scope.
- in6_setscope() will be used when the kernel receives a packet from the
- wire to construct the kernel internal form for each address field in
- the packet (typical examples are the source and destination addresses
- of the packet). In the typical usage, the third argument 'zoneidp'
- will be NULL. A non-NULL value will be used when the validity of the
- zone ID must be checked, e.g., when forwarding a packet to another
- link (see ip6_forward() for this usage).
- An application, when sending a packet, is basically assumed to specify
- the appropriate scope zone of the destination address by the
- sin6_scope_id field (this might be done transparently from the
- application with getaddrinfo() and the extended textual format - see
- below), or at least the default scope zone(s) must be configured as a
- last resort. In some cases, however, an application could specify an
- ambiguous address with regard to scope, expecting it is disambiguated
- in the kernel by some other means. A typical usage is to specify the
- outgoing interface through another API, which can disambiguate the
- unspecified scope zone. Such a usage is not recommended, but the
- kernel implements some trick to deal with even this case.
- A rough sketch of the trick can be summarized as the following
- sequence.
- sa6_embedscope(dst, ip6_use_defzone);
- in6_selectsrc(dst, ..., &ifp, ...);
- in6_setscope(&dst->sin6_addr, ifp, NULL);
- sa6_embedscope() first tries to convert sin6_scope_id (or the default
- zone ID) into the kernel-internal form. This can fail with an
- ambiguous destination, but it still tries to get the outgoing
- interface (ifp) in the attempt of determining the source address of
- the outgoing packet using in6_selectsrc(). If the interface is
- detected, and the scope zone was originally ambiguous, in6_setscope()
- can finally determine the appropriate ID with the address itself and
- the interface, and construct the kernel-internal form. See, for
- example, comments in udp6_output() for more concrete example.
- In any case, kernel routines except ones in netinet6/scope6.c MUST NOT
- directly refer to the embedded form. They MUST use the above
- interface functions. In particular, kernel routines MUST NOT have the
- following code fragment:
- /* This is a bad practice. Don't do this */
- if (IN6_IS_ADDR_LINKLOCAL(&sin6->sin6_addr))
- sin6->sin6_addr.s6_addr16[1] = htons(ifp->if_index);
- This is bad for several reasons. First, address ambiguity is not
- specific to link-local addresses (any non-global multicast addresses
- are inherently ambiguous, and this is particularly true for
- interface-local addresses). Secondly, this is vulnerable to future
- changes of the embedded form (the embedded position may change, or the
- zone ID may not actually be the interface index). Only scope6.c
- routines should know the details.
- The above code fragment should thus actually be as follows:
- /* This is correct. */
- in6_setscope(&sin6->sin6_addr, ifp, NULL);
- (and catch errors if possible and necessary)
- 1.3.2 Interaction with API
- There are several candidates of API to deal with scoped addresses
- without ambiguity.
- The IPV6_PKTINFO ancillary data type or socket option defined in the
- advanced API (RFC2292 or RFC3542) can specify
- the outgoing interface of a packet. Similarly, the IPV6_PKTINFO or
- IPV6_RECVPKTINFO socket options tell kernel to pass the incoming
- interface to user applications.
- These options are enough to disambiguate scoped addresses of an
- incoming packet, because we can uniquely identify the corresponding
- zone of the scoped address(es) by the incoming interface. However,
- they are too strong for outgoing packets. For example, consider a
- multi-sited node and suppose that more than one interface of the node
- belongs to a same site. When we want to send a packet to the site,
- we can only specify one of the interfaces for the outgoing packet with
- these options; we cannot just say "send the packet to (one of the
- interfaces of) the site."
- Another kind of candidates is to use the sin6_scope_id member in the
- sockaddr_in6 structure, defined in RFC2553. The KAME kernel
- interprets the sin6_scope_id field properly in order to disambiguate scoped
- addresses. For example, if an application passes a sockaddr_in6
- structure that has a non-zero sin6_scope_id value to the sendto(2)
- system call, the kernel should send the packet to the appropriate zone
- according to the sin6_scope_id field. Similarly, when the source or
- the destination address of an incoming packet is a scoped one, the
- kernel should detect the correct zone identifier based on the address
- and the receiving interface, fill the identifier in the sin6_scope_id
- field of a sockaddr_in6 structure, and then pass the packet to an
- application via the recvfrom(2) system call, etc.
- However, the semantics of the sin6_scope_id is still vague and on the
- way to standardization. Additionally, not so many operating systems
- support the behavior above at this moment.
- In summary,
- - If your target system is limited to KAME based ones (i.e. BSD
- variants and KAME snaps), use the sin6_scope_id field assuming the
- kernel behavior described above.
- - Otherwise, (i.e. if your program should be portable on other systems
- than BSDs)
- + Use the advanced API to disambiguate scoped addresses of incoming
- packets.
- + To disambiguate scoped addresses of outgoing packets,
- * if it is okay to just specify the outgoing interface, use the
- advanced API. This would be the case, for example, when you
- should only consider link-local addresses and your system
- assumes 1-to-1 relationship between links and interfaces.
- * otherwise, sorry but you lose. Please rush the IETF IPv6
- community into standardizing the semantics of the sin6_scope_id
- field.
- Routing daemons and configuration programs, like route6d and ifconfig,
- will need to manipulate the "embedded" zone index. These programs use
- routing sockets and ioctls (like SIOCGIFADDR_IN6) and the kernel API
- will return IPv6 addresses with the 2nd 16bit-word filled in. The
- APIs are for manipulating kernel internal structure. Programs that
- use these APIs have to be prepared about differences in kernels
- anyway.
- getaddrinfo(3) and getnameinfo(3) support an extended numeric IPv6
- syntax, as documented in RFC4007. You can specify the outgoing link,
- by using the name of the outgoing interface as the link, like
- "fe80::1%ne0" (again, note that we assume there is 1-to-1 relationship
- between links and interfaces.) This way you will be able to specify a
- link-local scoped address without much trouble.
- Other APIs like inet_pton(3) and inet_ntop(3) are inherently
- unfriendly with scoped addresses, since they are unable to annotate
- addresses with zone identifier.
- 1.3.3 Interaction with users (command line)
- Most of user applications now support the extended numeric IPv6
- syntax. In this case, you can specify outgoing link, by using the name
- of the outgoing interface like "fe80::1%ne0" (sorry for the duplicated
- notice, but please recall again that we assume 1-to-1 relationship
- between links and interfaces). This is even the case for some
- management tools such as route(8) or ndp(8). For example, to install
- the IPv6 default route by hand, you can type like
- # route add -inet6 default fe80::9876:5432:1234:abcd%ne0
- (Although we suggest you to run dynamic routing instead of static
- routes, in order to avoid configuration mistakes.)
- Some applications have command line options for specifying an
- appropriate zone of a scoped address (like "ping6 -I ne0 ff02::1" to
- specify the outgoing interface). However, you can't always expect such
- options. Additionally, specifying the outgoing "interface" is in
- theory an overspecification as a way to specify the outgoing "link"
- (see above). Thus, we recommend you to use the extended format
- described above. This should apply to the case where the outgoing
- interface is specified.
- In any case, when you specify a scoped address to the command line,
- NEVER write the embedded form (such as ff02:1::1 or fe80:2::fedc),
- which should only be used inside the kernel (see Section 1.3.1), and
- is not supposed to work.
- 1.4 Plug and Play
- The KAME kit implements most of the IPv6 stateless address
- autoconfiguration in the kernel.
- Neighbor Discovery functions are implemented in the kernel as a whole.
- Router Advertisement (RA) input for hosts is implemented in the
- kernel. Router Solicitation (RS) output for endhosts, RS input
- for routers, and RA output for routers are implemented in the
- userland.
- 1.4.1 Assignment of link-local, and special addresses
- IPv6 link-local address is generated from IEEE802 address (ethernet MAC
- address). Each of interface is assigned an IPv6 link-local address
- automatically, when the interface becomes up (IFF_UP). Also, direct route
- for the link-local address is added to routing table.
- Here is an output of netstat command:
- Internet6:
- Destination Gateway Flags Netif Expire
- fe80::%ed0/64 link#1 UC ed0
- fe80::%ep0/64 link#2 UC ep0
- Interfaces that has no IEEE802 address (pseudo interfaces like tunnel
- interfaces, or ppp interfaces) will borrow IEEE802 address from other
- interfaces, such as ethernet interfaces, whenever possible.
- If there is no IEEE802 hardware attached, last-resort pseudorandom value,
- which is from MD5(hostname), will be used as source of link-local address.
- If it is not suitable for your usage, you will need to configure the
- link-local address manually.
- If an interface is not capable of handling IPv6 (such as lack of multicast
- support), link-local address will not be assigned to that interface.
- See section 2 for details.
- Each interface joins the solicited multicast address and the
- link-local all-nodes multicast addresses (e.g. fe80::1:ff01:6317
- and ff02::1, respectively, on the link the interface is attached).
- In addition to a link-local address, the loopback address (::1) will be
- assigned to the loopback interface. Also, ::1/128 and ff01::/32 are
- automatically added to routing table, and loopback interface joins
- node-local multicast group ff01::1.
- 1.4.2 Stateless address autoconfiguration on hosts
- In IPv6 specification, nodes are separated into two categories:
- routers and hosts. Routers forward packets addressed to others, hosts does
- not forward the packets. net.inet6.ip6.forwarding defines whether this
- node is a router or a host (router if it is 1, host if it is 0).
- It is NOT recommended to change net.inet6.ip6.forwarding while the node
- is in operation. IPv6 specification defines behavior for "host" and "router"
- quite differently, and switching from one to another can cause serious
- troubles. It is recommended to configure the variable at bootstrap time only.
- The first step in stateless address configuration is Duplicated Address
- Detection (DAD). See 1.2 for more detail on DAD.
- When a host hears Router Advertisement from the router, a host may
- autoconfigure itself by stateless address autoconfiguration. This
- behavior can be controlled by the net.inet6.ip6.accept_rtadv sysctl
- variable and a per-interface flag managed in the kernel. The latter,
- which we call "if_accept_rtadv" here, can be changed by the ndp(8)
- command (see the manpage for more details). When the sysctl variable
- is set to 1, and the flag is set, the host autoconfigures itself. By
- autoconfiguration, network address prefixes for the receiving
- interface (usually global address prefix) are added. The default
- route is also configured.
- Routers periodically generate Router Advertisement packets. To
- request an adjacent router to generate RA packet, a host can transmit
- Router Solicitation. To generate an RS packet at any time, use the
- "rtsol" command. The "rtsold" daemon is also available. "rtsold"
- generates Router Solicitation whenever necessary, and it works greatly
- for nomadic usage (notebooks/laptops). If one wishes to ignore Router
- Advertisements, use sysctl to set net.inet6.ip6.accept_rtadv to 0.
- Additionally, ndp(8) command can be used to control the behavior
- per-interface basis.
- To generate Router Advertisement from a router, use the "rtadvd" daemon.
- Note that the IPv6 specification assumes the following items and that
- nonconforming cases are left unspecified:
- - Only hosts will listen to router advertisements
- - Hosts have a single network interface (except loopback)
- This is therefore unwise to enable net.inet6.ip6.accept_rtadv on routers,
- or multi-interface hosts. A misconfigured node can behave strange
- (KAME code allows nonconforming configuration, for those who would like
- to do some experiments).
- To summarize the sysctl knob:
- accept_rtadv forwarding role of the node
- --- --- ---
- 0 0 host (to be manually configured)
- 0 1 router
- 1 0 autoconfigured host
- (spec assumes that hosts have a single
- interface only, autoconfigred hosts
- with multiple interfaces are
- out-of-scope)
- 1 1 invalid, or experimental
- (out-of-scope of spec)
- The if_accept_rtadv flag is referred only when accept_rtadv is 1 (the
- latter two cases). The flag does not have any effects when the sysctl
- variable is 0.
- See 1.2 in the document for relationship between DAD and autoconfiguration.
- 1.4.3 DHCPv6
- We supply a tiny DHCPv6 server/client in kame/dhcp6. However, the
- implementation is premature (for example, this does NOT implement
- address lease/release), and it is not in default compilation tree on
- some platforms. If you want to do some experiment, compile it on your
- own.
- DHCPv6 and autoconfiguration also needs more work. "Managed" and "Other"
- bits in RA have no special effect to stateful autoconfiguration procedure
- in DHCPv6 client program ("Managed" bit actually prevents stateless
- autoconfiguration, but no special action will be taken for DHCPv6 client).
- 1.5 Generic tunnel interface
- GIF (Generic InterFace) is a pseudo interface for configured tunnel.
- Details are described in gif(4) manpage.
- Currently
- v6 in v6
- v6 in v4
- v4 in v6
- v4 in v4
- are available. Use "gifconfig" to assign physical (outer) source
- and destination address to gif interfaces.
- Configuration that uses same address family for inner and outer IP
- header (v4 in v4, or v6 in v6) is dangerous. It is very easy to
- configure interfaces and routing tables to perform infinite level
- of tunneling. Please be warned.
- gif can be configured to be ECN-friendly. See 4.5 for ECN-friendliness
- of tunnels, and gif(4) manpage for how to configure.
- If you would like to configure an IPv4-in-IPv6 tunnel with gif interface,
- read gif(4) carefully. You may need to remove IPv6 link-local address
- automatically assigned to the gif interface.
- 1.6 Address Selection
- 1.6.1 Source Address Selection
- The KAME kernel chooses the source address for an outgoing packet
- sent from a user application as follows:
- 1. if the source address is explicitly specified via an IPV6_PKTINFO
- ancillary data item or the socket option of that name, just use it.
- Note that this item/option overrides the bound address of the
- corresponding (datagram) socket.
- 2. if the corresponding socket is bound, use the bound address.
- 3. otherwise, the kernel first tries to find the outgoing interface of
- the packet. If it fails, the source address selection also fails.
- If the kernel can find an interface, choose the most appropriate
- address based on the algorithm described in RFC3484.
- The policy table used in this algorithm is stored in the kernel.
- To install or view the policy, use the ip6addrctl(8) command. The
- kernel does not have pre-installed policy. It is expected that the
- default policy described in the draft should be installed at the
- bootstrap time using this command.
- This draft allows an implementation to add implementation-specific
- rules with higher precedence than the rule "Use longest matching
- prefix." KAME's implementation has the following additional rules
- (that apply in the appeared order):
- - prefer addresses on alive interfaces, that is, interfaces with
- the UP flag being on. This rule is particularly useful for
- routers, since some routing daemons stop advertising prefixes
- (addresses) on interfaces that have become down.
- - prefer addresses on "preferred" interfaces. "Preferred"
- interfaces can be specified by the ndp(8) command. By default,
- no interface is preferred, that is, this rule does not apply.
- Again, this rule is particularly useful for routers, since there
- is a convention, among router administrators, of assigning
- "stable" addresses on a particular interface (typically a
- loopback interface).
- In any case, addresses that break the scope zone of the
- destination, or addresses whose zone do not contain the outgoing
- interface are never chosen.
- When the procedure above fails, the kernel usually returns
- EADDRNOTAVAIL to the application.
- In some cases, the specification explicitly requires the
- implementation to choose a particular source address. The source
- address for a Neighbor Advertisement (NA) message is an example.
- Under the spec (RFC2461 7.2.2) NA's source should be the target
- address of the corresponding NS's target. In this case we follow the
- spec rather than the above rule.
- If you would like to prohibit the use of deprecated address for some
- reason, configure net.inet6.ip6.use_deprecated to 0. The issue
- related to deprecated address is described in RFC2462 5.5.4 (NOTE:
- there is some debate underway in IETF ipngwg on how to use
- "deprecated" address).
- As documented in the source address selection document, temporary
- addresses for privacy extension are less preferred to public addresses
- by default. However, for administrators who are particularly aware of
- the privacy, there is a system-wide sysctl(3) variable
- "net.inet6.ip6.prefer_tempaddr". When the variable is set to
- non-zero, the kernel will rather prefer temporary addresses. The
- default value of this variable is 0.
- 1.6.2 Destination Address Ordering
- KAME's getaddrinfo(3) supports the destination address ordering
- algorithm described in RFC3484. Getaddrinfo(3) needs to know the
- source address for each destination address and policy entries
- (described in the previous section) for the source and destination
- addresses. To get the source address, the library function opens a
- UDP socket and tries to connect(2) for the destination. To get the
- policy entry, the function issues sysctl(3).
- 1.7 Jumbo Payload
- KAME supports the Jumbo Payload hop-by-hop option used to send IPv6
- packets with payloads longer than 65,535 octets. But since currently
- KAME does not support any physical interface whose MTU is more than
- 65,535, such payloads can be seen only on the loopback interface(i.e.
- lo0).
- If you want to try jumbo payloads, you first have to reconfigure the
- kernel so that the MTU of the loopback interface is more than 65,535
- bytes; add the following to the kernel configuration file:
- options "LARGE_LOMTU" #To test jumbo payload
- and recompile the new kernel.
- Then you can test jumbo payloads by the ping6 command with -b and -s
- options. The -b option must be specified to enlarge the size of the
- socket buffer and the -s option specifies the length of the packet,
- which should be more than 65,535. For example, type as follows;
- % ping6 -b 70000 -s 68000 ::1
- The IPv6 specification requires that the Jumbo Payload option must not
- be used in a packet that carries a fragment header. If this condition
- is broken, an ICMPv6 Parameter Problem message must be sent to the
- sender. KAME kernel follows the specification, but you cannot usually
- see an ICMPv6 error caused by this requirement.
- If KAME kernel receives an IPv6 packet, it checks the frame length of
- the packet and compares it to the length specified in the payload
- length field of the IPv6 header or in the value of the Jumbo Payload
- option, if any. If the former is shorter than the latter, KAME kernel
- discards the packet and increments the statistics. You can see the
- statistics as output of netstat command with `-s -p ip6' option:
- % netstat -s -p ip6
- ip6:
- (snip)
- 1 with data size < data length
- So, KAME kernel does not send an ICMPv6 error unless the erroneous
- packet is an actual Jumbo Payload, that is, its packet size is more
- than 65,535 bytes. As described above, KAME kernel currently does not
- support physical interface with such a huge MTU, so it rarely returns an
- ICMPv6 error.
- TCP/UDP over jumbogram is not supported at this moment. This is because
- we have no medium (other than loopback) to test this. Contact us if you
- need this.
- IPsec does not work on jumbograms. This is due to some specification twists
- in supporting AH with jumbograms (AH header size influences payload length,
- and this makes it real hard to authenticate inbound packet with jumbo payload
- option as well as AH).
- There are fundamental issues in *BSD support for jumbograms. We would like to
- address those, but we need more time to finalize the task. To name a few:
- - mbuf pkthdr.len field is typed as "int" in 4.4BSD, so it cannot hold
- jumbogram with len > 2G on 32bit architecture CPUs. If we would like to
- support jumbogram properly, the field must be expanded to hold 4G +
- IPv6 header + link-layer header. Therefore, it must be expanded to at least
- int64_t (u_int32_t is NOT enough).
- - We mistakingly use "int" to hold packet length in many places. We need
- to convert them into larger numeric type. It needs a great care, as we may
- experience overflow during packet length computation.
- - We mistakingly check for ip6_plen field of IPv6 header for packet payload
- length in various places. We should be checking mbuf pkthdr.len instead.
- ip6_input() will perform sanity check on jumbo payload option on input,
- and we can safely use mbuf pkthdr.len afterwards.
- - TCP code needs careful updates in bunch of places, of course.
- 1.8 Loop prevention in header processing
- IPv6 specification allows arbitrary number of extension headers to
- be placed onto packets. If we implement IPv6 packet processing
- code in the way BSD IPv4 code is implemented, kernel stack may
- overflow due to long function call chain. KAME sys/netinet6 code
- is carefully designed to avoid kernel stack overflow. Because of
- this, KAME sys/netinet6 code defines its own protocol switch
- structure, as "struct ip6protosw" (see netinet6/ip6protosw.h).
- In addition to this, we restrict the number of extension headers
- (including the IPv6 header) in each incoming packet, in order to
- prevent a DoS attack that tries to send packets with a massive number
- of extension headers. The upper limit can be configured by the sysctl
- value net.inet6.ip6.hdrnestlimit. In particular, if the value is 0,
- the node will allow an arbitrary number of headers. As of writing this
- document, the default value is 50.
- IPv4 part (sys/netinet) remains untouched for compatibility.
- Because of this, if you receive IPsec-over-IPv4 packet with massive
- number of IPsec headers, kernel stack may blow up. IPsec-over-IPv6 is okay.
- 1.9 ICMPv6
- After RFC2463 was published, IETF ipngwg has decided to disallow ICMPv6 error
- packet against ICMPv6 redirect, to prevent ICMPv6 storm on a network medium.
- KAME already implements this into the kernel.
- RFC2463 requires rate limitation for ICMPv6 error packets generated by a
- node, to avoid possible DoS attacks. KAME kernel implements two rate-
- limitation mechanisms, tunable via sysctl:
- - Minimum time interval between ICMPv6 error packets
- KAME kernel will generate no more than one ICMPv6 error packet,
- during configured time interval. net.inet6.icmp6.errratelimit
- controls the interval (default: disabled).
- - Maximum ICMPv6 error packet-per-second
- KAME kernel will generate no more than the configured number of
- packets in one second. net.inet6.icmp6.errppslimit controls the
- maximum packet-per-second value (default: 200pps)
- Basically, we need to pick values that are suitable against the bandwidth
- of link layer devices directly attached to the node. In some cases the
- default values may not fit well. We are still unsure if the default value
- is sane or not. Comments are welcome.
- 1.10 Applications
- For userland programming, we support IPv6 socket API as specified in
- RFC2553/3493, RFC3542 and upcoming internet drafts.
- TCP/UDP over IPv6 is available and quite stable. You can enjoy "telnet",
- "ftp", "rlogin", "rsh", "ssh", etc. These applications are protocol
- independent. That is, they automatically chooses IPv4 or IPv6
- according to DNS.
- 1.11 Kernel Internals
- (*) TCP/UDP part is handled differently between operating system platforms.
- See 1.12 for details.
- The current KAME has escaped from the IPv4 netinet logic. While
- ip_forward() calls ip_output(), ip6_forward() directly calls
- if_output() since routers must not divide IPv6 packets into fragments.
- ICMPv6 should contain the original packet as long as possible up to
- 1280. UDP6/IP6 port unreach, for instance, should contain all
- extension headers and the *unchanged* UDP6 and IP6 headers.
- So, all IP6 functions except TCP6 never convert network byte
- order into host byte order, to save the original packet.
- tcp6_input(), udp6_input() and icmp6_input() can't assume that IP6
- header is preceding the transport headers due to extension
- headers. So, in6_cksum() was implemented to handle packets whose IP6
- header and transport header is not continuous. TCP/IP6 nor UDP/IP6
- header structure don't exist for checksum calculation.
- To process IP6 header, extension headers and transport headers easily,
- KAME requires network drivers to store packets in one internal mbuf or
- one or more external mbufs. A typical old driver prepares two
- internal mbufs for 100 - 208 bytes data, however, KAME's reference
- implementation stores it in one external mbuf.
- "netstat -s -p ip6" tells you whether or not your driver conforms
- KAME's requirement. In the following example, "cce0" violates the
- requirement. (For more information, refer to Section 2.)
- M…