When and Why we format disks volumes

You know, I've been formatting disks for decades and for the longest time I never asked myself "Why?" It was just something that I was supposed to do. Growing up as a Child of VM, all I ever worried about was the CMS FORMAT command. In my mind all it did was (a) erase all the files, and (b) let me ACCESS it. But then I joined VM Development and did some work on the ACCESS command. My eyes were opened. Only then, at the tender young age of 23, did I begin to understand how file systems were really (not theoretically) developed.

Later, my childhood came to an end and I had to learn how the Control Program used disks for its own purposes.

I was lucky, I had other developers who knew these things. I had, essentially, mentors. Regardless of what anyone might say, true mastery of a complex system requires that you come up through the ranks, being first an apprenctice, then a journeyman, and finally a Master of The System.

Well, that was then and this was now. A lot of folks are just being thrown into z/VM, to sink or swim as they will. People are being mentored through social media and conferences more than anything else. To that end, I will try to give you some fish to get you going, and at the same time teach you to fish so you don't starve tomorrow.

Let's start with a mini-refresher of how disk storage devices are organized.

ECKD

Extended Count-Key-Data (ECKD) devices have cylinders, tracks, and records. The first two are attributes of the device's data addressing architecture; the host cannot change them. The number of cylinders is variable and the number of tracks is fixed at 15 per cylinder. Data is written in records on each track. As many records as needed, on as many tracks and cylinders needed to hold them, are used by the operating system to hold files and the file system metadata that supports them.

Each record can have a different length, but in the z/VM and Linux worlds, all records on the disk are 4096 bytes in length. We often refer to these 4K records as blocks so that they aren't confused with records as implemented by record-oriented file systems.

Except in z/OS, of course. There the file system holds application data records (LRECL) grouped together into blocks (BLKSIZE). Each block is typically written as a single ECKD record. In this way, you can see that ECKD architecture directly supports the MVS file system. All very confusing, but it's easy once you know how!

In z/OS, space is allocated by tracks. z/VM allocates space by cylinder. But it's all just convenience. There's nothing magical about either one.

FBA

Fixed-Block Architecture (FBA) devices, on the other hand, are organized without the concept of cylinders, tracks, or records. Instead, the disk is divided into 512-byte blocks that are sequentially numbered to provide direct access to any given blocks. Unlike ECKD, FBA disks do not require any disk records to hold data. It is for this reason that FBA is antithetical to the z/OS filesystem. The 4K blocks used by z/VM and Linux makes the use of FBA simply a matter of figuring out how to find block x.

Of course, a storage controller can store and optimize the physical data in any way it likes. This discussion of blocks, cylinders, tracks, and records are part of the storage architecture, not necessarily the physical implementation.

Device Support Facilities (ICKDSF)

In the System z universe, we use the IBM Device Support Facilties program, ICKDSF, to manage ECKD and FBA disks.

Back in the Ancient Times, we had to use ICKDSF to do things like "park" or "unpark" the heads, or to perform an analysis of the media to diagnose I/O errors and assign alternate tracks or cylinders for the failing ones.

But these days, we don't worry about managing the heads (SSDs and FLASH drives don't even have them). I/O errors are handled by the RAID arrays automatically. That lets us limit our use of ICKDSF to formatting and erasing volumes, as well as the manipulation of Peer-to-Peer Remote Copy (PPRC) functions.

In z/VM, ICKDSF can be called directly or it can be used via the CPFMTXA command. My references in this document will be to ICKDSF directly. All references to the ICKDSF CPVOLUME command can be replaced by equivalent uses of CPFMTXA.

CP-Owned and User Volumes

Every dasd volume visible to CP falls into one of two categories: CP-owned or User. That was easy; only two things to remember!

A CP-owned volume is one that receives special treatment. First of all, it is explicitly defined to CP in what is called the CP-owned list. This list is a combination of the the CP_OWNED statements in the SYSTEM CONFIG file and the results of the DEFINE CPOWNED command.

Secondly, the volume is formatted by ICKDSF CPVOLUME FORMAT.

The formatting operation creates an Allocation Map that indicates the purpose of each cylinder on the volume. Specifically, it identifies whether a cylinder is to be used for paging, spooling, temporary disks, the object directory, and minidisks used by CP at IPL (PARM disks).

If a cylinder isn't explicitly identified as one of the above, it will not be used by CP. This is called "PERM" space.

The allocation map can be modified by the CPVOLUME ALLOCATE command, but it has no effect until CP re-reads the allocation map. And that only happens at IPL or when you ATTACH a CP-owned volume to SYSTEM.

While a volume can be used for multiple purposes, it is Best Practice to dedicate a volume to a single use whenever practicable. While it might seem a waste of disk space, it isn't. The system performs better when it doesn't have to keep switching a volume's role. This will continue to be true until the day comes that CP exploits the Parallel Access Volume (PAV) capabilities of the disk storage controller for CP-owned volumes.

User volumes are notable for their lack of CP-managed data. They contain the majority of the minidisks. The only requirement is that they have an IBM standard VOL1 label. That means you can use ICKDSF INITIALIZE or CPVOLUME FORMAT, or anything else that creates a VOL1 label. Beyond that, CP doesn't care what's on the volume or how it's otherwise formatted. These volumes are identified in SYSTEM CONFIG or manually attached to SYSTEM, but they are not placed in the CP-owned list.

Formatting a Volume

Formatting a volume places data on it. What kind of data? At the very least, a label. Labels come in a variety of formats, but any OS that wants to put a label on a real volume must use an IBM-standard VOL1 label. It will include the 6-character volser and some sort of Volume Table of Contents (VTOC). For a complete description of the label, see Anatomy of a CP-owned volume.

A CP-owned volume must be formatted by ICKDSF CPVOLUME FORMAT. In addition to a label, CPVOLUME FORMAT of an ECKD volume will fill the remainder of the volume with as many 4K records as will fit, each composed entirely of zeros. At twelve (12) 4K records per track, and 15 tracks per cylinder, that's 180 records per cylinder. These records are called slots.

Finally, CPVOLUME FORMAT write the allocation map we talked about earlier along with the magic cookie "CPVOL". This is how CP knows that the volume contains an allocation map. If you look at cylinder 0, track 0, record 3, you will see it. CP will not accept a volume as CP-owned without the magic cookie. No CPVOLUME FORMAT, no CP-owned volume.

To z/OS, a CP-owned volume just looks like a volume with no empty space on it.

Disk Format Errors

When you format a CP-owned volume, format the entire thing. If you don't, you may end up seeing

HCP415E Six continuous paging errors have occurred on DASD rdev volume volser.

Allow me to interpret: Six consecutive paging operations to the indicated volume have failed since the last time you saw the message.

Why six? No one yet lives who now remembers. Maybe it was fine in 1975, it doesn't cut it in 2014. Or 1990, for that matter. (I have my suspicions as to the origin of The Six, but maybe sometime I'll have proof!)

In any case, this is an area of the system that's ripe for improvement. Ideally CP would tell you what cylinders aren't formatted correctly and give you a way to fix it.

Each I/O error, not just the sixth one, will cause an error message to go to the operator's console. The error message is extremely geeky, but it most likely be a a unit check on a LOCATE RECORD command. And that, my friend, means one of two things:

You allocated but didn't entirely format the paging volume. (sigh)

Someone or something else (cue the eerie music) is writing on your paging volume in a way that is incompatible with z/VM life forms. I'll tell you now that if someone else writes on the volume, but doesn't change the format, the good new is there will be no I/O errors. The bad news is that all hell will break loose since the data that was paged out is now corrupted. If it's a file transfer operation, no one will even notice.

When HCP415E is issued, CP won't use that part of the disk again until you re-IPL. To fix it, you need to make the volume OFFLINE_AT_IPL (watch out for duplicate volids!) and re-IPL. After the system is up, vary it online, attach it to yourself, reformat it, detach it, then attach it to SYSTEM and CP START it.

I know, you'd like to CP DRAIN it, but that's a pipe dream. (Drain? Pipe? Get it? HA!) While the CP DRAIN command will prevent CP from adding any new or changed pages to the drained volume, it does not move unreferenced pages to another volume. This prevents the volume from completely draining and therefore it cannot be DETACHed from SYSTEM. So a SHUTDOWN REIPL is the only reasonable alternative.

Don't forget to remove the OFFLINE_AT_IPL entry so that it comes online automatically on the next IPL.

Common sense says that you don't need to format volumes twice, right? For example, you don't need to perform an ICKDSF INIT function on a volume that you are about to give to CP. It's in the wrong format. Likewise, I wouldn't run ICKDSF CPVOLUME FORMAT on a volume that is going to have CMS or Linux minidisks on it, or be used by z/OS.

I mean, it doesn't hurt to do it, but it wastes time if you are the one doing the formatting. The good news is that formatting disks can be done by your automation tools in the background.

NOTE: If you change the size of a CP-owned volume, any extents not covered by the existing allocation map will treated as PERM space. If you want that additional space to be used by CP, you need to run CPVOLUME FORMAT against the new cylinders. If you don't format the additional cylinders, you will get the I/O errors mentioned above.

Sanitization: Residual Data Removal

Before a storage device is reassigned, it is Best Practice to remove any residual data on it before it is released for re-use. This applies to ALL storage devices that may have sensitive data on them. "Sensitive" is a broad term that applies to financial, personal, legal, classified, or any other data whose intentional or accidental disclosure might result in harm to individuals, businesses, government interests, or other institutions.

A storage device that has had data removed in such a way that the data cannot be easily retrieved or reconstructed is said to have been sanitized.

In "Guidelines for Media Sanitization", Special Publication 800-88, the United States National Instutute of Standards and Technology (NIST) defines four levels of sanitization: disposal, clearing, purging, and destruction. Since mainframe storage is a non-portable and represents a significant capital expense, we can't simply discard a disk or destroy it. Instead, we clear or purge the device.

Clearing a device means that the sensitive data on it is rendered unreadable by overwriting the data using standard write interfaces or by using some form of manufacturer-provided reset function. You can clear a volume by formatting it.

Purging data goes one step further by modifying the physical media in such a way that the data cannot be easily accessed or reconstructed using state-of-the-art techniques.

NIST has indicated that the preferred method for purge is a hardware-based function that peforms the task so that there is no need to have software that performs the function. Mechanisms include overwrites, cryptographic erasure, and destruction of encryption keys. (This is similar to the Data Security Erase function provided in most mainframe tape devices.)

If the vendor provides a way to sanitize a volume, use it. Otherwise use the ICKDSF TRKFMT ERASEDATA function, developed by IBM in cooperation with the National Computer Security Center, part of the National Security Agency.

    TRKFMT ERASEDATA -
           UNIT(dev)     -
           NOVERIFY      -
           CYCLES(1)     -
           CYLRANGE(0,nnnnn)

This will overwrite each track three times per cycle, alternating each bit on the track between 0 and 1. It does that by writing this bit pattern

  0x924924   1001 0010 0100 1001 0010 0100
  0x249249   0010 0100 1001 0010 0100 1001
  0x492492   0100 1001 0010 0100 1001 0010
  0x924924   1001 0010 0100 1001 0010 0100
  0x249249   0010 0100 1001 0010 0100 1001
  0x492492   0100 1001 0010 0100 1001 0010
  0x924924   1001 0010 0100 1001 0010 0100

You can see the 1s are "drifting" through the data. Once the track has been overwritten, the records on the track removed from the tracks by the Erase command.

Unless you have a specific requirement for multiple cycles, one cycle should suffice.

The Bottom Line

We know that a user volume (not in the CP-owned list) has two ultimate users. The system itself uses cylinder zero, and the virtual machine(s) use the rest.

If the volume is newly minted, fresh from the storage controller, then the disk is empty (erased) and it is immediately ready to be formatted by the ultimate user. So all you have to do is to run ICKDSF CPVOLUME FORMAT to write the label and allocation map.

If it's a "used" volume, then it may have residual data on it. It needs to be erased or formatted prior to releasing it to an untrusted virtual machine.

As long as residual data is removed (whether by erasure or formatting) before you give a virtual machine a minidisk, all will be well.

I hope this helps you understand more about the formatting of disks in z/VM.

The information provided, and views expressed on this site are my own and do not represent the IBM Corporation.