 | Level: Introductory Nigel Griffiths (nag@uk.ibm.com), pSeries Advanced Technical Support, IBM Edward Boden (EdwardBoden@uk.ibm.com), pSeries Advanced Technical Support, IBM Dave Williams (mmccrary@us.ibm.com), pSeries Advanced Technical Support, IBM
15 Jun 2002 Updated 10 May 2004 These free tools collect and display a huge amount of information about the workloads on your pSeries servers. Even though IBM doesn't officially support the tools and you must use them at your own risk, you can get vital information about which servers are over-utilised and which are under-utilised.
Introduction
Usage notes: The ncp and nweb tools are NOT OFFICIALLY SUPPORTED. No warrantee is given or implied, and you cannot obtain help on them from IBM. To place your name on an e-mail list for updates, contact Nigel Griffiths. The tools are the following:
- ncp - Nigel's Capacity Planning tool to collect data for capacity planning, server consolidation, and workload balancing purposes.
- nweb - Nigel's Web Server tool to display the collected data in a Web browser.
To use ncp and nweb, you need a single working directory to contain the programs and data. Because ncp and nweb use regular UNIX tools, they do not require you to have root access unless you need to create a user.
Server capacity planning and server consolidation
Capacity planning and server consolidation are not easy. These are large projects that require you to collect data about a computer system and then analyse the data to obtain useful information. But what is useful?
- Some machines have growing usage and it is important to know when resources will run out.
- Some machines might have free resources that can be better utilized.
- Some times workloads are merged onto fewer machines and it's important to know what is currently in use.
If you collect raw data, you will probably find the data is complex because there are many factors to remove from the data before you can spot any trends. For example:
- Each day of the week has different patterns of use.
- Each hour of the day has different uses in terms of online users, batch runs, and backups.
- Underlying all of these uses is the growth trend.
Introduction to capacity planning in four parts
Capacity planning can be broken in to four parts. The ncp tool offers help in most of these areas and requires very little effort to use:
- Collecting data
- Analysing data
- Vsualising and graphing data
- Planning for upgrades and workload growth
Along with the descriptions, we show what ncp offers for each of these activities.
Collecting data
For capacity planning and server consolidation data, you need to collect the performance statistics of the machines for as long as possible so that you can spot the trends. There are many ways to collect data including UNIX commands, freeware performance data gathering tools, special daemon processes, and third-party products. However, all tools should do the following:
- Make the minimum CPU impact.
- Put no limits on the data collected.
- Make sure of the maximum operating system compatibility.
- Maximise reliable and safe data collection.
The ncp tool uses standard UNIX commands to gather data and saves the information directly to a file. The tool uses:
- vmstat mainly for CPU usage but also for run queues and paging
- iostat for disk busy and I/O rates
- df for JFS usage
The level of detail in the collected data is important. For long-term analysis over weeks or months, you do not need statistics captured at intervals of seconds or minutes. For most purposes, a sensible level of detail is once per hour, which lets you analyse the trends across the day. Even though small temporary peaks in workload and dips in response times are ignored, you don't need this level of detail unless you are doing performance tuning instead of capacity planning.
The ncp tool uses the UNIX commands to report hourly statistics, a frequency that minimises the volume and the CPU power needed to gather the information. The size of the data depends on the number of disks and filesystems. Typically:
df = 1KB per day
vmstat = 2KB per day
iostat = 25KB per day
|
|
A good working average is 30KB per day times 365 days = approximately 10MB per year of raw data collected.
Analysing data
Raw data is fine, but it needs to be organised and summarised to be useful. Because most systems show different use patterns during the course of a day, it's useful to collect hourly data. As discussed below, you need to extract information on both "hours of the day" and "days of the week" so that you can find the real trends amongst the otherwise confusing data.
The raw output of the UNIX commands is not particularly friendly for analysis. It also contains repeated or cryptic headings. The ncp tool summarises the raw data in a small database file from which the graphs are generated. The database summary can also be useful for problem determination.
Because the data is analysed for particular days of the week, there is little point in looking at the graphs before four to six weeks have passed. Capacity planning requires collecting data over a considerable period of time.
Visualising and graphing data
Because so much data is collected, graphing is the only sensible way to understand it. Also, the data contains so many dimensions that you need to graph it in many different ways in order to see and extract all the possible meanings of the data.
From the graphs, you can explore the underlying trends in the workloads, which usually means looking for workload and resource usage growth. From the collected data, you can graph the past and current usage.
Future usage is much harder as predicting the future has always been a problem. Predictions can be attempted in three ways:
- Using linear mathematics to project the next few points on the graph. This exercise is moderately difficult.
- Using queuing theory and making some large assumptions to predict the performance of workload growth. This exercise uses complex mathematics, and the assumptions are broad.
- Simply studying the graphs and "using your eye" to predict the next few points. This exercise is easy and yields useful results.
The ncp tool uses the third method by graphing the data, which was collected into the database file described previously, as a simple matrix. Due to the complex nature of workloads (different workloads during different hours and weekly trends in usage of users and batch runs) the data is graphed in a number of different ways to allow you to see the trends. Following are the three ways you can capture and graph the data:
-
To capture raw data
Type ncp -c dir to start the vmstat, iostat, and df commands in the background. The output goes to sub-directories of the dir directory called vmstat, iostat, and df. The filenames in each directory are YYYY_MM_DD.
-
To merge data and generate graphs
Type ncp -g dir to read the raw data and create a summary in the file dir/database.cpu. This file is then used to generate the graphs. The graphs are in HTML. (Note that this method abuses the HTML standard, which does not support graphs.)
The ncp tool creates one-cell tables of a particular background colour that contain a transparent gif image. You can resize the gif image and use it to create a bar of a specific size. Lots of these bars are contained in a table to make the bar charts in a neat graph layout. To make creating the bar simple, the HTML files contain Javascript commands to generate the tables on the fly.
-
To visualise the data
A web server can supply the HTML files. You can use nweb, which is a very simple and safe web server, or you can use a regular web server.
Planning for upgrades and workload growth
In general, you may encounter two occasions when you need to upgrade:
- To make sure the system maintains response times or batch and backup windows while the data volumes, users, or transaction rates grow over time.
- To merge workloads from various machines onto a single machine.
From any one configuration, you have a limited but sensible set of upgrade options:
-
CPU. As most IBM eServer pSeries machines are SMP based and are not shipped with the maximum number of CPUs, you can add more. Also, as technology improves many pSeries machines can take the newer faster CPU option (once the model has been available for a year or so).
-
Memory. Most pSeries machines are initially configured with between 25% and 33% of the maximum memory, so you can add more.
-
Disks. With IBM SSA, FAST disks and ESS/EMC/HDS disk subsystems, which are connected via fibre channel, it is extremely simple to add more disks (and connections, if necessary). With SCSI disks, you can add some disks, but you may have to add extra SCSI adapters too.
-
Adapters. All except the smallest machines have high numbers of adapter slots. At the high end of the pSeries range, you can add more I/O drawers.
-
Network. Either extra adapters or faster network adapters can help with network limitations.
Most upgrades involve sensible options like the following:
- Adding a pair of CPUs or more.
- Adding memory in 1GB, 2GB, or 4GB units.
- Making more disk storage available in 100GB or four disk-pack chunks.
In planning the upgrade, you just need to decide when to do it. The decision on how much to upgrade is relatively simple.
When planning for workload growth in capacity planning, three types of workloads are interesting to study.
The Dolly Parton workload
The online user workload tends to have two peaks, one in mid-morning and one mid-afternoon with a dip at lunchtime. Informally, this pattern is called the Dolly Parton curve. For capacity planning, it is the peaks that are important. It is vital to find out which peak is higher, the time that the peak happens, and which days of the week are particularly large. The ncp tool can tell you this information. After you have established the peak hour and day of the week, you need to study this peak for trend information to understand the peak resource usage.
Workloads for batch windows and back-up window growth
Both of these workloads tend (if well tuned) to use nearly 100% of the machine resources, particularly the CPU. These workloads start at a set time and then run till finished. For capacity planning with these workloads, the important area to study is the hour that you think will be the final hour of the run. If the batch or back-up job is taking longer than system resources can support, this so-called final hour will show growth until 100% and then the following hour will start being used. If you observe this final hour and watch the growth, you can determine the length of time the workload will take in the future.
A typical capacity planning session
This section refers to the sample data graphs that the ncp tool provides and explains how to interpret them. At the top of the output is a series of links to specific graphs via individual buttons. In the middle is a colourised summary of the CPU statistics.
The main top level webpages looks like this:
From this you can go to the:
- Collected data (after a few weeks this will contain useful data)
- Sample data (which we will look at here), this will give you the sort of data to expect
- Read the Documentation and Installation instructions
Selecting the Sample Data will take you to the details web page that includes:
- Machine configuration including the CPU number and type, memory and disks.
Configuration sample
System Model: IBM,7043-270
Machine Serial Number: 10ac7bd
Processor Type: PowerPC_POWER3
Number Of Processors: 4
Processor Clock Speed: 375 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: -1 NULL
Memory Size: 8192 MB
Good Memory Size: 8192 MB
Firmware Version: IBM,SPH99323
Console Login: enable
Auto Restart: true
Full Core: false
Network Information
Host Name: blue.aixncc.uk.ibm.com
IP Address: 9.137.62.2
Sub Netmask: 255.255.255.0
Gateway: 9.137.62.1
Name Server: 127.0.0.1
Domain Name: aixncc.uk.ibm.com
Paging Space Information
Total Paging Space: 1024MB
Percent Used: 1%
Volume Groups Information
==============================================================================
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active 542 0 00..00..00..00..00
hdisk1 active 542 9 00..00..00..00..09
==============================================================================
INSTALLED RESOURCE LIST
The following resources are installed on the machine.
+/- = Added or deleted from Resource List.
* = Diagnostic support not available.
Model Architecture: chrp
Model Implementation: Multiple Processor, PCI bus
+ sys0 00-00 System Object
+ sysplanar0 00-00 System Planar
+ mem0 00-00 Memory
+ proc0 00-00 Processor
+ L2cache0 00-00 L2 Cache
+ proc1 00-01 Processor
+ proc2 00-02 Processor
+ proc3 00-03 Processor
* pci1 00-fee00000 PCI Bus
+ ent2 20-58 IBM 10/100 Mbps Ethernet PCI Adapter
(23100020)
+ mtn0 20-60 GXT3000P Graphics Adapter
* pci0 00-fef00000 PCI Bus
* isa0 10-58 ISA Bus
+ fda0 01-D1 Standard I/O Diskette Adapter
+ fd0 01-D1-00-00 Diskette Drive
* siokma0 01-K1 Keyboard/Mouse Adapter
+ sioka0 01-K1-00 Keyboard Adapter
+ sioma0 01-K1-01 Mouse Adapter
+ siota0 01-Q1 Tablet Adapter
+ paud0 01-Q2 Ultimedia Integrated Audio
+ ppa0 01-R1 CHRP IEEE1284 (ECP) Parallel Port
Adapter
+ sa0 01-S1 Standard I/O Serial Port
+ tty0 01-S1-00-00 Asynchronous Terminal
+ sa1 01-S2 Standard I/O Serial Port
+ tty1 01-S2-00-00 Asynchronous Terminal
+ scsi0 10-60 Wide/Fast-20 SCSI I/O Controller
+ rmt0 10-60-00-0,0 SCSI 4mm Tape Drive (12000 MB)
+ cd0 10-60-00-1,0 SCSI Multimedia CD-ROM Drive (650
MB)
+ hdisk0 10-60-00-8,0 16 Bit SCSI Disk Drive (18200 MB)
+ hdisk1 10-60-00-9,0 16 Bit SCSI Disk Drive (18200 MB)
+ tok0 10-70 IBM PCI Tokenring Adapter (14103e00)
+ ent1 10-78 IBM 10/100 Mbps Ethernet PCI Adapter
(23100020)
+ ent0 10-80 IBM 10/100 Mbps Ethernet PCI Adapter
(23100020)
+ scsi2 10-88 Wide/Ultra-2 SCSI I/O Controller
|
|
- The File System Use
- The CPU Utilisation colourised chart
This chart breaks out the CPU statistics by:
- Each Row is a day of the week
- Each column is an hour of the day
The brighter the colour, the higher the CPU utilisation. The colour key at the bottom of the chart indicates the percentage of use.
On busy machines you should see busy hours typically in mid-morning and mid-afternoon user workload peaks and busy batch run periods, typically after the working day and after midnight. Also, you will be able to see which are the busiest days of the week.
Look for white, yellow, and orange cells because these colours indicate busy periods. Note that these cells show the averages for the whole data collection period, but the percentages in the last week (or so) could well be much higher than average.
Select the button in the cell you are interested to get the full details for the hour or day. Look for the following trends:
- For online user workloads, it is recommended not to go beyond roughly 75% to 85% of the CPU. Otherwise, the response times can increase rapidly.
- For batch and back-up workloads, the last hour is used more and more as the workload increases. Once it reaches 100%, the following hour will start being used. Compare the trend with the available batch/backup window to determine when it will impact the business processes.
In the sample data, note that:
- Online peaks are at roughly 0900 to 1000 am (i.e. 9 am to 10 am) and 1300 to 1500 (i.e. 1 pm to 3 pm).
- Batch runs start at 1 am and 9 pm.
- Friday is the busiest day.
- Batch overruns can be seen at 0300 to 0400 (i.e. 3 am and 4am), particularly on Fridays.
If you click on the button in the cell that you are interested in, then you will see the graph of just that day of the week and that hour of the day. For example, the Friday 1300 graphs looks like this:
From this you can see the CPU utilisation trend for this particular hour is growing from 75% and reaching 90% in the last 9 months.
- The Disk I/O Utilisation colourised chart
This works in the same way as the CPU Utilisation graph but note that the percentages are worked out relative from the peak Disk I/O workload found in the data and this is noted in the Tittle Line (in this example, 75000 KBytes/second).
Other graphs
Following are the specific graphs you can use:
- CPU Day of the Week. A break down of the days of the week during the day and during the night to highlight your busy trends.
- CPU Hours. A break down of the hours of the day for each day of the week to highlight your busy trends.
- CPU Daily Average. A long graph with all the days. This graph is useful if your usage does not have any daily or hourly trends. For workloads that are not users online and batch runs, this graph may prove quite useful to show, for example, machines used for running only batch reports or DSS machines that run long term.
- SQL statements. A graph to help you spot a long-term growth trend and patterns of use.
- CPU Daily Average by Day. A long graph with all the days broken out by day.
Limitations of the ncp tool
The following limitations apply to the current release:
- CPU stats are used and analysed.
- JFS Filesystems stats are used and analysed.
- The data for memory and disks is collected and can be analysed in later releases.
- Some boundary conditions are suspect. For example, are we working out which day is Monday correctly and what happens if statistics are missing in the middle of the data
Installation
This section describes how to install both ncp and nweb.
To use ncp and nweb, you need a single working directory to contain the programs and data. As ncp and nweb use regular UNIX tools, you do not require root access unless you need to create a user.
Which directory?
There are four ways to set up ncp and nweb. We recommend and assume the first method:
- Create a new user. Create a user and place all the files in this user's home directory. This user should own all the files, which keeps them safe and secure.
- Use the tool as root. Some system administrators may prefer to use /var/perf/ncp or a similar system directory for holding system type data.
- Use an existing web server. If you already have a web server running on your machine, then you could place the ncp directory in its web pages directory. This method will let you view the ncp-generated graphs with your regular web server. In the top-level directory of ncp there is an index.html file.
- Collate the files via NFS. If you have a large number of machines and use NFS mount directories, you could use NFS to bring all the graphs to a single machine for display in a single web server. You can either use ncp to collect data to an NFS directory (with different sub directories for each machine) or use NFS mount from the ncp directory on each machine to one central server.
Setting up ncp
In the following instructions, the new user is called ncp. Before installing the tools, you must FTP the installation ncp.tar file containing the code and data.
First, create the user called ncp with smitty user, select Add User, name the user ID ncp, and name the user details Capacity Planning.
Next, move the ncp.tar file to the /home/ncp directory and then take the following steps:
- Login as the ncp user.
- Make sure you are in the home directory with
cd /home/ncp.
- Untar the code and data to the current directory with
tar xvf ncp.tar.
- Check that ncp runs correctly by outputting the help information with
./ncp -?.
- Check that you have a number of .html files with
ls -l.
- Make directories and sample data files with
../ncp -x.
Now ncp and nweb are ready for use.
Setting up ncp data collection
To collect CPU performance data, run the ncp tool every day starting at 00:01 (just past midnight). The simplest way to do this is to use a cron job. The steps that follow will set up ncp to collect data every day (mandatory) and generate the graphs once a day. As the ncp user (or root if you are using /var/perf/...), do the following:
- Start editing (vi is the default editor) the cron entry with:
crontab -e.
- At the bottom of the file add
1 0 * * * /home/ncp/ncp -c /home/ncp/ncp.
- At the bottom of the file add
0 1 * * * /home/ncp/ncp -g /home/ncp/ncp.
- Save the file and exit with
:wq.
Now wait a month or more for some useful data to be collected and meaningful graphs generated.
Note: You could reduce data collection to once a week or even once a month by changing the ncp -g cron line.
Setting up - nweb
To see the graphs, you have to make them available to a web browser. The simplest way is via a web server. Displaying the graphs is the purpose of nweb, which is a tiny passive web server that is highly secure because it does not support any fancy features. It can only contain server .gif, .jpg, .jpeg, .png, .zip, .gz, .tar, .htm and .html files. It does not allow web page names including the parent directory name ".." for obvious security reasons.
Starting nweb
To start nweb, take the following steps:
- Start /home/ncp/nweb [portno] /home/ncp, where portno is an unused TCP/IP port number. For more information, see /etc/services.
Note: The nweb tool reports if the port number you type is already in use, so you will not mess anything up by using the wrong port number (8181 seems to work for us). If you don't have a web browser running on the machine, you could use 8080, which is the default web server port number.
- So you would actually type /home/ncp/nweb 8181 /home/ncp/nweb.
- With your browser go to http:[yourmachine]:8181/index.html.
Note: You have to use the correct machine ID, which you can obtain via an IP address or hostname. Examples include:
http://9.123.45.67:8181/index.html
http://my.machine.company.com:8181/index.html |
|
Wait one month or more to gather capacity planning data. With your browser, check the data again and start capacity planning. Note that we have tested nweb only with Netscape 4.7 and IE 6.
If you are upgrading from earlier ncp versions, it is easy but please see the install.txt in the tar file below.
Download | Description | Name | Size | Download method |
|---|
| Sample tar file for this article | es-ncp.zip | 29KB | HTTP |
|---|
Resources Learn
Get products and technologies
-
IBM trial software: Build your next development project with software for download directly from developerWorks.
Discuss
About the authors  | |  | Nigel Griffiths works in the IBM eServer pSeries and specialises in performance, Linux, sizing, tools, benchmarks, and Oracle RDBMS. The ncp and nweb tools were developed to support server consolidation projects and for system administrators wanting to better understand what their machines are up too with little effort and zero costs. You can contact him at nag@uk.ibm.com. |
 | |  | Ed Boden works in IBM eServer, he was involved with ncp from the outset and in particular for kicking concepts and ideas around until we had a workable and useful design. You can contact him at EdwardBoden@uk.ibm.com . |
 | |  | Dave Williams, for web server guts, testing and dot.gif. Dave works in the IBM eServer pSeries Technical Support -- Advanced Technology Group. Dave has been working on AIX since 1984 (yes, well before the RS/6000) and is mainly involved with new product introduction like the POWER5 in EMEA. |
Rate this page
|  |