Introductions: Introductions Day 1: Performance and Monitoring
Li Xinman, TEIN2 NOC & CERNET NOC, PhD
Day 2: Troubleshooting
Li Pengfei, CERNET NOC, CCIE
Day 3: Emergency Response
Wang Yan, CERNET NOC, CCIE
Performance & Monitoring: Performance & Monitoring Li Xinman
TEIN2 NOC, CERNET NOC
Sept.4-8, 2006
AIT, Thailand
Agenda: Agenda Introduction to Performance Management
TEIN2 NOC updates and NMS
Performance Monitoring technologies and tools
Netflow and applications
Case Study
Functions of Network Management: Functions of Network Management Fault management
Network state monitoring
Failure logging, reporting and tracking etc.
Configuration management
device and software configuration
version control (compare, apply and rollback, backup) etc.
Accounting management
billing and traffic measurement etc.
Performance management
Security Management
Access control, worm/attack detection and alert etc.
Performance Management-Why: Performance Management-Why Why needed and important?
Capacity planning
when do we need to upgrade our link and device?
Ensure network availability
Verify network performance, verify QoS (we expected)
Ensure SLA compliance (customer expected)
Better understanding and control of network
Optimization, make the network runs better!
Murphy’s Law (also why need NOC?)
If Anything can go wrong, it will.
left to themselves, things tend to go from bad to worse.
(The network can’t look after itself. That’s nice for us )
Proactive or reactive?
Know problem before users and boss
Solve the problem before their complain
Or
Wait for problem to happen, and customers complain?
As a NOC, we should be proactive, NOC means NO Complain!
Performance Management-What: Performance Management-What What’s performance management?
understanding the behavior of a network and its elements in response to traffic demands
Measuring and reporting of network performance to ensure that performance is maintained at a acceptable level
Performance Management-How: Performance Management-How How to measure the network performance
Delay, jitter, packet loss, bandwidth usage etc.
The steps and process of performance management:
Data collection
Baseline the network
Determining the threshold for acceptable performance
Tunning
Technologies and tools needed
Data collection technologies such as: sniffing & netflow
QoS
Tools: ping, mrtg, iperf, wget, etc.
Delay (Latency): Delay (Latency) Delay = propagation delay + serialization delay
Propagation delay: the time it takes to the physical signal to traverse the path; depends on distance. (add 6 ms for 1000km Fibre link)
The delay from Beijing to Guanzhou is about 34 ms (CERNET), the distance is about 3000Km.
Serialization delay is the time it takes to actually transmit the packet; caused by intermediate networking devices, includes queuing, processing and switching time (normally, less than 1ms for one networking devices, but not firewalls or heavily loaded routers)
Comfortable human-to-human audio is only possible for round-trip delays not greater than 100ms
Tools: ping, traceroute etc.
Jitter: Jitter is the variation of the delay, a.k.a the 'latency variance,' can happen because:
variable queue length generates variable latencies
Load balancing with unequal latency
In general, higher levels of jitter are more likely to occur on either slow or heavily congested links. It is expected that the increasing use of “QoS” control mechanisms such as class based queuing, bandwidth reservation and of higher speed links such as 100M Ethernet
Harmless for many applications but real-time applications as voice and video
Applications will need jitter buffer to make it smoothly
Tolerable Jitter range for VOIP is: 20ms – 30ms
Tools: ping etc. J1 = abs(t2-t1), J2=abs(t3-t2), ….
Packet Loss: Packet Loss Loss of one or more packets, can happen because ...
Link or hardware caused CRC error
Link is congested or queue is full (tail drop or even RED/WRED)
route change (temporary drop) or blackhole route (persistent drop)
Interface or router down
Misconfigured access-list
...
1% packet loss is terrible and unusable!
Tools: ping etc.
Bandwidth Utilization: Bandwidth Utilization Capacity plan: decide when to upgrade the link, but maybe investment depended
Better less than 35% (and commercial ISPs do)
For CERNET, most links are above 70%, some above 95%, in our theory, for E&R networks, 70% is acceptable
For TEIN2 now, most links are below 15% !!
Tools: MRTG, SNMP tools, telnet etc.
Network Availability: Network Availability is the metric used to determine uptime and downtime
Availability = (uptime)/(total time) = 1-(downtime)/(total time)
Network availability is the IP layer reachability
Better > 99.9%
99.9%
30x24x60x0.1%=43.3 (Minutes), means the down time should be less than 45 minutes in one month
99.99%
30x24x60x0.01%=4.3 (Minutes), means the down time should be less than 5 minutes in one month!
99.9% is acceptable for R&E networks (Even 99.0% is acceptable), some commercial ISPs can reach 99.99%
The network devices should be 99.999% available or as specified, but it’s not the truth even the top venders
Packets Per Second (PPS): Packets Per Second (PPS) Important for performance: network performance is highly affected by PPS, such as delay or packet loss, because the serialization delay will increase because of the load of the intermediate routers
PPS is a very important metric to detect DOS/DDOS traffic
E.g. normally, the pps of one GE link is about 100,000 (baseline), if raised to 200,000 pps sharply, then it means DOS.
Easy to get: show interface
CPU and Memory Utilization: CPU and Memory Utilization We focus on routers
CPU utilization better less than 30%
For global routing routers, at least 512M memory is needed
QoS: QoS QoS: Quality Of Service
QoS is technology to manage network performance
QoS is a set of performance measurements
Delay, Jitter, packet loss, availability, bandwidth utilization etc.
IP QoS: QoS for IP service
QoS Architecture: QoS Architecture Best Effort
IntServ
End to end, session state needed
RSVP
CPU and Memory intensive
Difficult to deploy
Not scalable
DiffServ
PHB: Per-Hop-Behavior, Not end-to-end
Scalable
Easy to deploy
What is using now: DiffServ + IP, DiffServ + MPLS
If network bandwidth is enough, there is no need for QoS?
QoS Practice: Traffic Shaping (rate-limit): QoS Practice: Traffic Shaping (rate-limit) 40Mbps for all outbound traffic
interface FastEthernet2/0
rate-limit output 40000000 400000 400000 conform-action transmit exceed-action drop
40Mbps for specific traffic through ACL
interface FastEthernet2/0
rate-limit output access-list 110 40000000 400000 400000 conform-action transmit exceed-action drop
access-list 110 deny tcp any any eq www
access-list 110 deny tcp any eq www any
Access-list 110 permit ip any any
QoS Practice: Modular QoS Command: QoS Practice: Modular QoS Command Classify the traffic, definition of traffic
class-map match-any limit-campus
match access-group 170
2) Define the traffic policy
policy-map limit-30M
class limit-campus
police 30000000 30000 30000 conform-action transmit
3) Apply the traffic policy
interface GigabitEthernet5/2
service-policy input limit-34M
service-policy output limit-34M
Traffic classification example: Traffic classification example
SLA and QoS: SLA and QoS SLA: Service Level Agreement
SLA is the agreement between service provider and customer, SLA defines the quality of the service the service provider delivered, such as delay, jitter, packet loss etc.
SLA is a very important part of the business contract, and also can be used to distinguish the service level of different ISPs Business SLA Technology QoS
SLA example: Level 3: SLA example: Level 3 Delay Packet Loss Availability Jitter Bandwidth
SLA example: Sprintlink: SLA example: Sprintlink
Measurement Technology: Measurement Technology We’ve known what metrics used to describe network performance, but how to measure them?
Technologies and tools
ping, traceroute, telnet and CLI commands etc.
SNMP
Netflow (Cisco), Sflow (Juniper), NetStream (Huawei)
IP SLA (Cisco)
Etc.
ping: ping Normally used as a troubleshooting tool
Uses ICMP Echo messages to determine:
Whether a remote device is active (for trouble shooting)
round trip time delay (RTT), but not one-way delay
Packet loss
Sometime we need to specify the source and length of packet using extended ping in router or host
Why using large packet when ping?
(to test the link quality and throughput.)
Large packet ping is prohibited in Windows, but Linux is ok
Sample Ping: Sample Ping Freebsd>% ping 202.112.60.31
PING 202.112.60.31 (202.112.60.31) 56(84) bytes of data.
64 bytes from 202.112.60.31: icmp_seq=1 ttl=253 time=0.326 ms
……
64 bytes from 202.112.60.31: icmp_seq=6 ttl=253 time=0.288 ms
6 packets transmitted, 6 received, 0% packet loss, time 4996ms
rtt min/avg/max/mdev = 0.239/0.284/0.326/0.025 ms router# ping
Protocol [ip]:
Target IP address: 202.112.60.31
Repeat count [5]:
Datagram size [100]: 3000
Timeout in seconds [2]:
Extended commands [n]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 5, 3000-byte ICMP Echos to 202.112.60.31, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/4 ms
traceroute: traceroute Can be used to measure the RTT delay, and also the delay between the routers along the path
Unix/linux traceroute uses UDP datagram with different TTL to discover the route a packet take to the destination, Microsoft Windows tracert uses ICMP protocol, If Windows tracert appears to show continuous timeouts, the router may be filtering ICMP traffic – try a Unix/Linux traceroute
After the Nachi worm, many ISPs filter ICMP traffic. So ping can not work, but traceroute is ok H1 router1 router2 router3 2ms 15ms 2ms 19ms
Sample Traceroute: Sample Traceroute Router# traceroute 202.112.60.37
Type escape sequence to abort.
Tracing the route to 202.112.60.37
1 202.112.53.169 0 msec 0 msec 0 msec
2 202.112.36.250 20 msec 20 msec 16 msec
3 202.112.36.254 28 msec 28 msec 24 msec
4 202.112.53.202 24 msec * 24 msec
Visual Route: Visual Route Visualization of traceroute information
http://www.visualroute.com
telnet and CLI commands: telnet and CLI commands Using telnet manually or scripts programmed with Expect to telnet the network device then issue the CLI commands is also a useful and basic monitoring method to get performance data
It’s necessary because some data can only be accessed through CLI commands, and not supported by SNMP etc. How about config file?
Show interface: Show interface Bandwidth utilization information, PPS etc
Examples
show interface GigaEthernet2/24
GigabitEthernet2/24 is up, line protocol is up (connected)
Description: to-tein2-xing-20060119
Internet address is 202.179.241.26/30
MTU 9216 bytes, BW 1000000 Kbit, DLY 10 usec,
reliability 255/255, txload 33/255, rxload 14/255
Input queue: 0/75/1/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 55010000 bits/sec, 17367 packets/sec
5 minute output rate 133299000 bits/sec, 18476 packets/sec
L2 Switched: ucast: 235554 pkt, 32942922 bytes - mcast: 44728 pkt, 4631058 bytes
L3 in Switched: ucast: 7786262800 pkt, 2957731471301 bytes - mcast: 0 pkt, 0 bytes mcast
L3 out Switched: ucast: 8883546304 pkt, 7850287572491 bytes mcast: 0 pkt, 0 bytes
......
It’s better not to change the bandwidth setting (even for ospf metric) 13% and 5.5%
Show process cpu/mem: Show process cpu/mem Measure the usage of CPU and memory
router1>sh proc cpu
CPU utilization for five seconds: 2%/0%; one minute: 5%; five minutes: 5%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
1 8 91 87 0.00% 0.00% 0.00% 0 Chunk Manager
2 5876 4393609 1 0.00% 0.00% 0.00% 0 Load Meter
3 1400 200869 6 0.00% 0.00% 0.00% 0 BGP Open
4 0 1 0 0.00% 0.00% 0.00% 0 EE48 TCAM Carve
5 50811784 2895942 17545 0.00% 0.25% 0.22% 0 Check heaps
....
Sometime, the CPU usage of the processes ‘IP input’ and ‘BGP Scanner’ will be very high
Remember don’t run out the telnet session number! Else you will be keep out of the router.
SNMP: SNMP SNMP is a Internet standard management framework that provides facilities for managing and monitoring network resources on the Internet Components of SNMP
MIB: managed information base
SNMP Agent: software runs on network device to maintain MIB
SNMP manager: application program contacts agent to query or modify the MIB at agent
SNMP Protocol: is the application layer protocol used by SNMP agents and managers to send and receive data, the data is encoded in BER
SMI: Structure and Syntax of Management Information, standard defines how to create a MIB
SNMP Architecture: SNMP Architecture
MIBs: MIBs A MIB specifies the managed objects
MIB is a text file that describes managed objects using the syntax of ASN.1 (Abstract Syntax Notation 1)
ASN.1 is a formal language for describing data and its properties
In Linux, MIB files are in the directory /usr/share/snmp/mibs
Multiple MIB files
RFC1213-MIB.txt, MIB-II (defined in RFC 1213) defines the managed objects of TCP/IP networks
Managed Objects: Managed Objects Each managed object is assigned an object identifier (OID)
The OID is specified in a MIB file.
An OID can be represented as a sequence of integers separated by decimal points or by a text string:
Example:
1.3.6.1.2.1.4.6. (looks like IPv6 address? )
iso.org.dod.internet.mgmt.mib-2.ip.ipForwDatagrams
When a SNMP manager requests an object, it sends the OID to the SNMP agent.
Organization of Managed Objects: Organization of Managed Objects Managed objects are organized in a tree-like hierarchy and the OIDs reflect the structure of the hierarchy.
Each OID represents a node in the tree.
The OID 1.3.6.1.2.1 (iso.org.dod.internet.mgmt.mib-2) is at the top of the hierarchy for all managed objects of the MIB-II.
Manufacturers of networking equipment can add product specific objects to the hierarchy.
Definition of Managed Object in a MIB: Definition of Managed Object in a MIB OBJECT-TYPE
String that describes the MIB object.
Object Identifier (OID)
SYNTAX
Defines what kind of info is stored in the MIB object
ACCESS
READ-ONLY, READ-WRITE
STATUS
State of object in regards the SNMP community
DESCRIPTION
Reason why the MIB object exists Standard MIB Object:
sysUpTime OBJECT-TYPE
SYNTAX Time-Ticks
ACCESS read-only
STATUS mandatory
DESCRIPTION
“Time since the network management portion of the system was last re-initialised.”
::= {system 3}
IF-MIB (64-bit counters): IF-MIB (64-bit counters)
SNMP Protocol: SNMP Protocol C/S based, Client Pull and Server Push
Ports: UDP 161(snmp messages), UDP 162(trap messages)
SNMP manager and an SNMP agent communicate using the SNMP protocol
Generally: Manager sends queries and agent responds
Exception: Traps are initiated by agent.
SNMP Functions: SNMP Functions Get-request. Requests the values of one or more objects
Get-next-request. Requests the value of the next object, according to a lexicographical ordering of OIDs.
Set-request. A request to modify the value of one or more objects
Get-response. Sent by SNMP agent in response to a get-request, get-next-request, or set-request message.
Trap. An SNMP trap is a notification sent by an SNMP agent to an SNMP manager, which is triggered by certain events at the agent
Traps: Traps Traps are triggered by an event
Defined traps include:
linkDown: Even that an interface went down
coldStart - unexpected restart (i.e., system crash)
warmStart - soft reboot
linkUp - the opposite of linkDown
(SNMP) AuthenticationFailure
…
Traps can be received by a management application, and handled in several ways: logging, paging, alerting, or completely ignore
SNMP Versions: SNMP Versions Three versions are in use today:
SNMPv1 (1990)
SNMPv2c (1996)
Adds “GetBulk” function and some new data types (such as 64 bit counters)
Adds RMON (remote monitoring) capability
The only version endorsed by IETF but not others as SNMPv2u and SNMPv2* with security features.
SNMPv3 (2002)
SNMPv3 started from SNMPv1 (and not SNMPv2c)
Addresses security
All versions are still used today, but version 1&2 are most commonly used, don’t bother version 3 if not necessary
Many SNMP agents and managers support all three versions of the protocol
SNMP Community Strings: SNMP Community Strings Like passwords
Two kinds:
READ-ONLY: You can send out a Get & GetNext to the SNMP agent, and if the agent is using the same read-only string it will process the request.
READ-WRITE: Get, GetNext, and Set. If a MIB object has an ACCESS value of read-write, then a Set PDU can change the value of that object with the correct read-write community string.
Default community string: public (read), private (write)
Keep the R/W community string secret ! In the fact, RW comnunity is not so necessary!
SNMP Security: SNMP Security SNMPv1 uses plain text community strings for authentication as plain text without encryption
SNMPv2 was supposed to fix security problems, but effort de-railed (The “c” in SNMPv2c stands for “community”).
SNMPv3 has numerous security features: Integrity, authentication and privacy
Instead of granting access rights to a community, SNMPv3 grants access to users
Access can be restricted to sections of the MIB (View based Access Control Module (VACM). Access rights can be limited
by specifying a range of valid IP addresses for a user or community,
or by specifying the part of the MIB tree that can be accessed
SNMP Configuration: SNMP Configuration Configuring SNMP access
snmp-server community notpublic ro
snmp-server community topsecret rw 60
access-list 60 permit 10.1.1.1
access-list 60 permit 10.2.2.2
Configuring Traps
snmp-server host 10.1.1.1 public
snmp-server enable traps
snmp-server enable traps bgp
snmp-server enable traps snmp bgp
snmp-server trap-source loopback 0
About View (for security)
Snmp-server view testview 1.3.6.1.2.1 included (mib-2)
Snmp-server view testview 1.3.6.1.4.1.9 included (cisco)
Snmp-server community test1 testview ro 60
ifIndex – Interface Name?: ifIndex – Interface Name? Ifindex is the unique value to identify interface of a router
show snmp mib ifmib ifindex interface
to show the ifindex of interfaces, e.g.
(router)#sh snmp mib ifmib ifindex pos9/0
Interface = POS9/0, Ifindex = 28
Or snmpwalk?
Most management software using ifIndex for data collection and monitoring, such as MRTG, for SNMP, it’s a part of an OID
But it will change after router reboot
snmp-server ifindex persist
Keep from changing when reboot
System MIB (MIB-II): System MIB (MIB-II) .1.3.6.1.2.1.1.1
.ios.org.dod.internet.mgmt.mib-2.system
.1.3.6.1.2.1.1.1.1
.ios.org.dod.internet.mgmt.mib-2.system.sysDescr
.1.3.6.1.2.1.1.1.2
.ios.org.dod.internet.mgmt.mib-2.system.sysObjectID
.1.3.6.1.2.1.1.1.3
.ios.org.dod.internet.mgmt.mib-2.system.sysUpTime
.1.3.6.1.2.1.1.1.4
.ios.org.dod.internet.mgmt.mib-2.system.sysContact
.1.3.6.1.2.1.1.1.5
.ios.org.dod.internet.mgmt.mib-2.system.sysName
MIB instances: MIB instances Each MIB can have an instance, some will have more
A MIB for a router’s (entity) interface information:
iso(1) org(3) dod(6) internet(1) mgmt(2) mib-2(1) interfaces(2) ifTable(2) ifEntry(1)
Require one ifEntry value per interface (e.g. 3)
One MIB object definition can represent multiple instances through Tables, Entries, and Indexes ifType(3) Index #2 Index #3 Index #1 ifMtu(4) Etc… ifType.3:[15] ifType.2:[9] ifType.1[6] ifMtu.1 ifMtu.2 ifMtu.3 ENTRY + INDEX = INSTANCE
SNMP Operation: snmpget: SNMP Operation: snmpget Example 1:
MIB:
1.3.6.1.2.1.1.1.1
ios.org.dod.internet.mgmt.mib-2.system.sysDescr
Results:
$ snmpget -v 1 202.112.0.156 test888 .1.3.6.1.2.1.1.1.0
system.sysDescr.0 = Cisco Internetwork Operating System Software
IOS (tm) C2600 Software (C2600-I-M), Version 12.2(11)T3, RELEASE SOFTWARE (fc2)
TAC Support: http://www.cisco.com/tac
Copyright (c) 1986-2002 by cisco Systems, Inc.
Compiled Sun 22-Dec-02 02:49 by ccai
Exmple 2:
MIB:
1.3.6.1.2.1.1.1.3
ios.org.dod.internet.mgmt.mib-2.system.sysUpTime
Results:
$ snmpget -v 2c 202.112.0.156 test888 .1.3.6.1.2.1.1.3.0
system.sysUpTime.0 = Timeticks: (494755800) 57 days, 6:19:18.00
SNMP Operation: snmpset: SNMP Operation: snmpset MIB
1.3.6.1.2.1.1.1.4
ios.org.dod.internet.mgmt.mib-2.system.sysContact
Operation
$ snmpget -v 1 202.112.0.xxx write888 .1.3.6.1.2.1.1.4.0
system.sysContact.0 = test
$ snmpset -v 1 202.112.0.xxx write888 .1.3.6.1.2.1.1.4.0 s "CERNET NOC"
system.sysContact.0 = CERNET NOC
$ snmpget -v 1 202.112.0.xxx write888 .1.3.6.1.2.1.1.4.0
system.sysContact.0 = CERNET NOC
SNMP Operation: snmpwalk: SNMP Operation: snmpwalk MIB
1.3.6.1.2.1.1.1
ios.org.dod.internet.mgmt.mib-2.system
Operation
$ snmpwalk -v 2c 202.112.0.xxx test888 .1.3.6.1.2.1.1
system.sysDescr.0 = Cisco Internetwork Operating System Software
IOS (tm) C2600 Software (C2600-I-M), Version 12.2(11)T3, RELEASE SOFTWARE (fc2)
TAC Support: http://www.cisco.com/tac
Copyright (c) 1986-2002 by cisco Systems, Inc.
Compiled Sun 22-Dec-02 02:49 by ccai
system.sysObjectID.0 = OID: enterprises.9.1.208
system.sysUpTime.0 = Timeticks: (494811433) 57 days, 6:28:34.33
system.sysContact.0 = "CERNET NOC, 86-10-62784048"
system.sysName.0 = cernoclab
system.sysLocation.0 = "THU Main Building Room306"
system.sysServices.0 = 78
system.sysORLastChange.0 = Timeticks: (0) 0:00:00.00
SNMP Operation: snmpbulkget: SNMP Operation: snmpbulkget MIB
1.3.6.1.2.1.1.1
ios.org.dod.internet.mgmt.mib-2.system
Operation
$ snmpbulkget -v 2c -B 0 10 202.112.0.xxx test888 .1.3.6.1.2.1.1
system.sysDescr.0 = Cisco Internetwork Operating System Software
IOS (tm) C2600 Software (C2600-I-M), Version 12.2(11)T3, RELEASE SOFTWARE (fc2)
TAC Support: http://www.cisco.com/tac
Copyright (c) 1986-2002 by cisco Systems, Inc.
Compiled Sun 22-Dec-02 02:49 by ccai
system.sysObjectID.0 = OID: enterprises.9.1.208
system.sysUpTime.0 = Timeticks: (494914259) 57 days, 6:45:42.59
system.sysContact.0 = CERNET NOC
system.sysName.0 = cernoclab
system.sysLocation.0 = "THU Main Building Room306"
system.sysServices.0 = 78
system.sysORLastChange.0 = Timeticks: (0) 0:00:00.00
interfaces.ifNumber.0 = 3
interfaces.ifTable.ifEntry.ifIndex.1 = 1
Interface MIB (MIB-II, 32bit counters): Interface MIB (MIB-II, 32bit counters) 1.3.6.1.2.1.2
ios.org.dod.internet.mgmt.mib-2.interfaces
1.3.6.1.2.1.2.1
.ifNumber
1.3.6.1.2.1.2.2
.ifTable
1.3.6.1.2.1.2.2.1
.ifTable.ifEntry
1.3.6.1.2.1.2.2.1.2
.ifTable.ifEntry.ifDescr
1.3.6.1.2.1.2.2.1.10
.ifTable.ifEntry.ifInOctets
1.3.6.1.2.1.2.2.1.16
.ifTable.ifEntry.ifOutOctets
Interface MIB (MIB-II) Operation: Interface MIB (MIB-II) Operation $ snmpget -v 2c 202.112.0.xxx test888 .1.3.6.1.2.1.2.2.1.2.1
interfaces.ifTable.ifEntry.ifDescr.1 = FastEthernet0/0
$ snmpget -v 2c 202.112.0.xxx test888 .1.3.6.1.2.1.2.2.1.10.1
interfaces.ifTable.ifEntry.ifInOctets.1 = Counter32: 2984051368
$ snmpget -v 2c 202.112.0.xxx test888 .1.3.6.1.2.1.2.2.1.16.1
interfaces.ifTable.ifEntry.ifOutOctets.1 = Counter32: 490955885
Cisco Interface MIB: Cisco Interface MIB .1.3.6.1.4.1.9.2.2.1.1
.iso.org.dod.internet.private.enterprises.cisco.local.interfaces.lifTable.lifEntry
.1.3.6.1.4.1.9.2.2.1.1.1
.locIfHardType
.1.3.6.1.4.1.9.2.2.1.1.28
.locIfDescr
.1.3.6.1.4.1.9.2.2.1.1.6
.locIfInBitsSec
.1.3.6.1.4.1.9.2.2.1.1.7
.locIfInBitsPktsSec
.1.3.6.1.4.1.9.2.2.1.1.8
.locIfOutBitsSec
.1.3.6.1.4.1.9.2.2.1.1.9
.locIfOutpktsSec
Cisco Interface MIB Operation: Cisco Interface MIB Operation Operation
$ snmpget -v 2c 202.112.xx.xx public .1.3.6.1.4.1.9.2.2.1.1.28.159
enterprises.9.2.2.1.1.28.159 = "bj-a1 to bj1 10G"
$ snmpget -v 2c 202.112.xx.xx public .1.3.6.1.4.1.9.2.2.1.1.1.159
enterprises.9.2.2.1.1.1.159 = "C6k 10000Mb 802.3"
$ snmpget -v 2c 202.112.xx.xx public .1.3.6.1.4.1.9.2.2.1.1.6.159
enterprises.9.2.2.1.1.6.159 = 1179992000
$ snmpget -v 2c 202.112.xx.xx public .1.3.6.1.4.1.9.2.2.1.1.8.159
enterprises.9.2.2.1.1.8.159 = 1835180000
Show interface
bj-a1-bgw#sh int te7/3
TenGigabitEthernet7/3 is up, line protocol is up (connected)
Hardware is C6k 10000Mb 802.3, address is 0014.a9f7.be80 (bia 0014.a9f7.be80)
Description: bj-a1 to bj1 10G
5 minute input rate 1177610000 bits/sec, 327712 packets/sec
5 minute output rate 1835759000 bits/sec, 358057 packets/sec
RMON: RMON Remote Monitoring Specification: provides standard information that a network administrator can use to monitor, analyze, and troubleshoot a group of distributed local area networks (LANs) and interconnecting lines from a central site
RMON is for traffic management
specified as part of the MIB and an extension of SNMP
the latest level is RMON Version 2 (referred to as "RMON 2" or "RMON2")
RMON can be supported by hardware monitoring devices (known as "probes") or through software or some combination
Diagram of RMON MIB: Diagram of RMON MIB MIB 1&2 MIB 1 MIB 2 Root ISO Org DoD Internet Mgmt Private RMON1 RMON2 RMON
RMON MIB Groups: RMON MIB Groups Statistics - Traffic and error rates on a segment
History - Above statistics with a time stamp
Alarm - User defined threshold alarms on any RMON variable
Hosts - Traffic and error rates for each host by MAC address
Host Top N - Sorts hosts by top traffic and/or error rates
Matrix - Conversation matrix between hosts
Filter - Definition of what packet types to capture and store
Packet Capture - Creates a capture buffer on the probe that can be requested and decoded by the management application
Event - Generates log entries and/or SNMP traps
Token Ring - Token Ring extensions, most complex group
RMON2: RMON2 RMON RMON2 Physical Data Link Network Transport Application Session Presentation RMON2 is standard for monitoring higher protocol layers.
SNMP Tools: SNMP Tools CLI Commands
Snmpget, snmpset, snmpwalk, snmpbulk, etc
MIB Browser
iReasoning, solarwinds etc
Large Applications: Network Management System
HP OpenView
IBM Tivoli (netview)
Sun NetManager
Etc.
Commercial SNMP Applications: Commercial SNMP Applications http://www.hp.com/go/openview/ HP OpenView
http://www.tivoli.com/ IBM NetView
http://www.novell.com/products/managewise/ Novell ManageWise
http://www.sun.com/solstice/ Sun MicroSystems Solstice
http://www.microsoft.com/smsmgmt/ Microsoft SMS Server
http://www.compaq.com/products/servers/management/ Compaq Insight Manger
http://www.redpt.com/ SnmpQL - ODBC Compliant
http://www.empiretech.com/ Empire Technologies
ftp://ftp.cinco.com/users/cinco/demo/ Cinco Networks NetXray
http://www.netinst.com/html/snmp.html SNMP Collector (Win9X/NT)
http://www.netinst.com/html/Observer.html Observer
http://www.gordian.com/products_technologies/snmp.html Gordian’s SNMP Agent
http://www.castlerock.com/ Castle Rock Computing
http://www.adventnet.com/ Advent Network Management
http://www.smplsft.com/ SimpleAgent, SimpleTester
SNMP Tools-GUI (MIB Browser): SNMP Tools-GUI (MIB Browser)
MRTG: MRTG The Multi Router Traffic Grapher: a freeware written in Perl, works on unix/linux, graph data collected from routers and other devices or applications based on SNMP.
One of most popular network monitoring tools used today: to monitoring the bandwidth utilization of network link
SNMP v2c support, no more counter wrapping
http://oss.oetiker.ch/mrtg/
Configuration of MRTG: Configuration of MRTG cfgmaker to generate a configuration file and tune
cfgmaker public@192.168.1.1 | tee test.cfg
Setting up crontab in (/etc/crontab), runs every 5 minutes
*/5 * * * * wang /usr/bin/mrtg /home/wang/mrtg/test1.cfg
Two basic object types in MRTG
Counter: object that returns an unsigned integer that grows over time
Gauge: A gauge integer will go up an down according the variable it tracks
Options[_]: gauge, growright
Enable snmpv2c:
Target[192.168.1.12_28]: 28:test888@192.168.1.12: Version 1 (default)
Target[192.168.1.12_28]: 28:test888@192.168.1.12:::::2 Version 2c
MRTG Example: MRTG Example
Bandwidth Utilization Monitoring: Bandwidth Utilization Monitoring
Delay & Packet Loss: Delay & Packet Loss
IPerf: IPerf Client/server application that
Measures maximum TCP performance
Facilitates tuning of TCP and UDP parameters
Reports bandwidth, jitter, and packet loss
http://dast.nlanr.net/Projects/Iperf/
Performance Management Process: Performance Management Process Monitoring Baseline Detection Optimization Performance
management
Performance Matrix: Performance Matrix Traffix Matrix
Delay Matrix
Packet Loss Matrix
…….
Distributed Backbone Performance Monitoring Architecture: Distributed Backbone Performance Monitoring Architecture Management
Console Performance data collection agents in infrastructure ……
Data Collection Agent: Data Collection Agent Routers?
Embedded: If the router is strong enough, it’s ok
Dedicated routers: Shadow Router
Cisco 26xx/28xx is enough
Steady and easy to deploy
Mature software solutions
Servers?
Embedded: If the load of the server is not heavy, it’s good
Dedicated Servers: Test Server
Flexible: monitoring anything as you like
Easy: Free tools is quite enough
Ping, traceroute, iperf, wget, beacon etc.
Low Cost: a normal 1U PC server is not as expensive as a router
Cisco Performance Measure Technology: Cisco Performance Measure Technology
Introduction of IP SLA: Introduction of IP SLA Allow users to monitor network performance between Cisco routers or from either a Cisco router to a remote IP device.
Embedded within Cisco IOS software and there is no additional device to deploy, learn, or manage.
A dependable, a scalable, cost-effective solution for network performance measurement.
Collect network performance information in real time: response time, one-way latency, jitter, packet loss, voice quality measurement, and other network statistics.
Multi-Protocol Measurement and Management with Cisco IOS IP SLAs: Multi-Protocol Measurement and Management with Cisco IOS IP SLAs
CERNET: Data Collection Agents Distribution: CERNET: Data Collection Agents Distribution Agent PoP Agent PoP Agent PoP Agent National Center Console
Server …… Core Access Core Access Core Access Access Core
Tools and Technologies Used: Tools and Technologies Used Ping
Traceroute
Snmp
telnet
FreeBSD
Perl
Rrdtools, GD
Multicast beacon
Iperf
Etc.
Performance Metric Example: Packet Loss: Performance Metric Example: Packet Loss
Performance Metric Example: Delay: Performance Metric Example: Delay
Performance Metric Example: Multicast: Performance Metric Example: Multicast
Thank You!: Thank You! Some materials are from network, thanks goes to the authors!