CTC Definitions for z Systems

I don't mean to offend my IBM brethren, but the Channel-to-Channel (CTC) documentation in the IOCP book is abysmal. Don't get me wrong; all the information you need is there, but it's kind of spread out. If your configuration is exactly like the examples, you're all set. Otherwise you have a lot more homework to do.

What I intend to do here is to blow away the fog and the seeming illogic of the I/O definitions for FICON CTCs. Rather than hand you a fish, I want to teach you how to fish. I want you to be able to configure CTCs for any environment, not just the ones in the books!

To begin, I need to turn back the clock and talk about the basic tenets of CTCs back in the days of S/360 and S/370. The CTC protocol hasn't changed much since then, but the implementation has changed a lot!

Conceptually, a CTC adapter (CTCA) is a double-sided control unit that pairs together the unit addresses on opposite sides of the adapter. In this picture, the unit addresses are numbered 0, 1, 2, and 3.

In this example A28 is paired with EA0, A29 with EA1, A2A with EA2, and A2B with EA3.

Refresher: Back in those days, the device number referenced by the host was the real device address: CUU (channel, unit, unit). The UU was was 0-n, plus the control unit address. So unit address 5 on control unit C0 attached to channel 3 would be referenced as 3C5. Different control units had different rules for what was a valid CU address.

Later, the standalone single channel-pair CTCA was replaced by the IBM 3088 Multichannel Control Unit. This control unit contained (wait for it....) multiple CTCAs that could interconnect the channels on an any-to-any basis. The low-order two digits of the device address determined which remote system you were talking to, as predefined by the IBM 3088.

The above ia logical representation. Physically, everyone connected to the IBM 3088:

     RUBY                                         SAPPHIRE
   .--------------------.       CUADDR= -.       .--------------------.
   |    To        Use   |         ______ V       |   Use       To     |
   | SAPPHIRE : 800-803 |        |      |C0      | 2C0-2C3  : RUBY    |
   |                    |===== 00| 3088 |========|                    |
   | EMERALD  : 804-807 |        |______|        | 2C8-2CB  : EMERALD |
   |                    |          || 20         |                    |
   '____________________'          ||            '____________________'
                                   ||
                                   ||
                                   ||             EMERALD
                                   ||            .--------------------.
                                   ||            |   Use       To     |
                                   ||            | 724-727 : RUBY     |
                                   ::============|                    |
                                                 | 728-72B : SAPPHIRE |
                                                 |                    |
                                                 '____________________'

While there was a method to the numbering madness, few understood it. But we didn't have to since the 3088 manual contained a nice chart with all the relationships described. All you had to do was plug in some numbers, et voilà! Perhaps you remember filling out one of these and taping it to the side or cover of the 3088? I know I do!

Of course, as a VM guy, my chart was filled with things like VTAM, TSAF, PVM, and RSCS.

But once ESCON arrived on the scene, the 3088 faded into history. Both ESCON and FICON emulate the CTCA internally. With ESCON, you had to choose which side was the channel and which was the CTCA, but with FICON, you don't have to worry about that. The adapter makes intelligent decisions about which side should have which role.

Let's talk about FICON devices and how they are addressed. Consider that a FICON channel can communicate with thousands of devices. To communicate, every device needs to have a unique address so that the FICON frames are sent to the proper device.

On the host side we still have our trusty device number, but it's really just an indirect reference to a channel path, a control unit that can be reached by that path, and a specific device managed by that control unit. When the channel subsystem sees inbound data from a particular device unit, the channel subsystem can figure out which host device number will receive an interrupt and get the data. This information is all established via IOCP or HCD.

To wit:

That's kind of obvious, right? Right. But now I'm going to complicate things: Add a second LPAR. Well. Hmmm. See the problem? We need a way to differentiate A00 on LPAR 1 from A00 from LPAR 2.

Rather than complicating the device numbering scheme, IBM engineers decided to update the low-level channel protocols to include additional information that lets the channel subsystem manage I/O from multiple LPARs on the same channel. To go with that they invented the Multiple Image Facility (MIF). This is a function of the channel subsystem that adds the specific information needed on outbound frames to let the receiving system respond to the correct partition.

When you think about an IBM Z I/O configuration, consider:

The physical I/O configuration with all of the I/O adapters, their phsycial channel IDs (PCHID numbers), and cables.
The logical I/O configurations, or Channel Subsystems (CSSes), to which each partition is assigned. It's here that we find CHPID numbers, CUs, and devices. It's also here that we find access lists that control which LPARs can see which CHPIDs, CUs, or devices.
The channel image. This is the I/O configuration as viewed from inside a single parition. Access rights have been applied. Every LPAR has its own channel image.

To identifiy a specific channel image, we must know both the CSS and the Multiple Image Facility (MIF) ID of the partition within that CSS. The channel image ID uniquely identifies a partition in a machine.

The channel image ID is a two-digit hex value, where the first digit is the logical channel subsystem (CSS) ID to which the LPAR belongs, and the second digit is the LPAR's MIF image ID in that CSS. Both of those values are obtained from the LPAR's definition in the I/O configuration.

For example, given that I have three systems; RUBY, SAPPHIRE, and EMERALD:

     RESOURCE PART=( ( CSS(0),(RUBY,1),(SAPPHIRE,2) ) ,
                     ( CSS(5),(EMERALD,B)           )    )

We know that RUBY has channel image ID 01, SAPPHIRE has 02, and EMERALD has 5B.

The area of the FICON frame that contains the channel image ID was previously zero. Consequently, a MIF image ID of zero means that the CPC is running in Basic Mode and there's only one I/O configuration using the channel subsystem. And that, boys and girls, is why you can't assign an LPAR a MIF image ID of 0!

A thousand words:

The use of the channel image ID enables the channel subsystem to differentiate the I/Os performed by different LPARs using the same device number. The control unit and device don't care. They just return data to the channel image that requested it.

The arrival of ESCON on the scene signaled the end of physical CTC CUs. They were replaced by firmware running inside the channels themselves. One end of the connection played the role of the channel and the other end played the role of the CU.
That's all well and good, but it creates a bit of a CU addressing problem. How are you supposed to address a seemingly non-existent CU? As it turns out, by the channel image ID I described above. So if RUBY wants to talk to EMERALD, the channel image that RUBY is using must have a CU that references the channel image that EMERALD belongs to. Likewise, EMERALD must have a CU that references RUBY's channel image. It won't work without BOTH definitions.
Let's look at that in more detail.
You can see the three of our four LPARs, Ruby, Sapphire, and Emerald.

Each LPAR is using a point-to-point (PTP) FICON connection to talk to the other LPARs. This is required even if you want an LPAR to talk to itself!
The channel image ID move back and forth on a CTC, there must be some kind of addressing information in the layer 2 CTC frames sent on the FICON channel. Sure enough, each frame includes an origin channel subsystem (CSS), control unit (CU) and unit address (UA), as well as a a target CSS, CU, and UA. They perform the same function as MAC addresses in an ethernet network.
In fact, a switched FICON connection is essentially a non-broadcast multi-access fabric (a WAN), and a PTP FICON connection is like a LAN. But in either case, they require that frames contain origin and target communications endpoint addresses.
This means that Sapphire has information in its IOCDS about Ruby, and Ruby has information about Sapphire.
In particular, Sapphire has to know the CSS and and MIF ID of each of Ruby's LPARs which which any of Sapphire's LPARs will communicate. Bottom line, the sender is able to send because it has a definition for the receiver, and the receiver is able to correlate the incoming data because it has a definition of the sender.
A first glance, this seems very strange, but I guess it only seems that way because I always think of your average FICON control unit as being attached to the local CPC. Well, it isn't. Rather, the FICON chpid is just a gateway to the outside world and the control units have destination addresses to differentiate them from the others connected to the chpid. (LAN semantics, again!)
Which means the only truly odd aspect is that we have a configuration item in the IOCDS that represents the CTC control unit's view of the world. We don't usually see that sort of thing since it's normally handled in the separate control unit software, but the CTC CUs are integrated into the channel subsystem.
Here's the important part: The CU is the MIF ID of the origin and target LPARs. There is no real CTCA any more, so the CU numbers are at the discretion of the folks who designed the integrated CTCAs, and they said "CU=" is how it Shall Be Done. So mote it be.
Let's start off with the basic assumption that you've managed to physically connect Sapphire's chpid 63 to Ruby's chpid 73 with a single cable.
In order for Sapphire S4 to SEND data on a CTC to Ruby R1, it needs three pieces of information

The FCTC chpid on Sapphire that is connected to Ruby,
The CU on that chpid that points to LPAR R1,
The unit address on that CU that I will use to SEND the data.

In order for LPAR R1 to RECEIVE data from LPAR S4, I need three pieces of information

The chpid on Sapphire that is connected to Ruby,
The CU on that chpid that points to LPAR R1,
The unit address on that CU that I will use to SEND the data.
that will will RECEIVE the data Between the two ports is the CTC Vanishing Point where the CTC data stops being transmitted and starts being received. It effectively disappears from the sender's radar. This is true even when the sender and receiver are the same CPC! I mention this because you need to keep BOTH ends of the communications in mind as you work on configuraiton.
As the sender, Sapphire needs to put the receiver's MIF ID and a unit address into the CTC frame before it transmits it down the cable. Sapphire gets information from the IOCDS. Specifically, it uses the Cn there in the form of "At the other end of my chpid 63, you will find a paired chpid that has of this cable (one of Saphhire's chpids), you will find a CU defined with the MIF ID of one of Ruby's LPARs. You will queue this data on the device of that CU that has same unit address as the device address I'm using to send it. my MIF MIF ID x. The unit address on that In order to define the CTC, the IOCDSs on both sides must have information about the other side. CHPID 60 CU= CTC connections are created by a low-level Establish Logical Path operation in which the channel on one side and the CU on the other exchange information.
The local channel knows about the chpid on the receiver's CPC, and a unit address to identify a specific device on the target control unit. unit address to identify a specific multiple LPARs.
Each is connected to the other two via two FICON channel paths (cables).
Consider what has to happen when Ruby receives a frame of data from Topaz on a FICON port. Ruby has to answer two questions:

What LPAR gets the data?
What device number will be used to deliver the data?
"Obvious!" you say. If so, stop reading this and go do something constructive. Plant a tree. Give blood. Go outside and get some fresh air and exercise. Donate to your local animal shelter.
The rest of you, keep reading. (This is fascinating stuff, isn't it! Hello? Hello? Is this thing on?)
Where were we? Oh, right. Who gets the data. Or more specifically, which of Ruby's devices will receive the data?
It turns out that Ruby makes the decision, with some hints provide by Topaz. Just like in the olden days, we create a CTC by connecting Side A and Side B together with a cable. You know what? It's still true!
What's also true is that Side A and B are ignorant of each others' to talk to: LPAR x in CSS y. So a CTC frame is transmitted very similarly to an ethernet frame. It contains origin identification, target identification, and the data. (I'm oversimplifying, of course, but you can live with that.)

+----------------------+ | Origin CSS | +----------------------+ | Origin MIF ID | +----------------------+ | | : Data : | | +----------------------+

A Day in the Life of a CTC Frame.

To send a CTC frame from R1 to S1, a miracle occurs. No, wait, that's not right. No miracle. Just some simple logic.

The wetware tells R1 which device connects to S1, and tells S1 which device connects to R1. It seems trivial to figure that out, right? Yeah, well, if it were trivial, I wouldn't be writing this, would I?
There are auto-discovery mechanisms that are used, but they're not part of the CTC protocol, but are in the application itself.
Host R1 writes the data to the CTC device.
is associated with a CU on a CHPID that connects to the remote host (peer-to-peer or switched). The physical connection conveys the fame to e

The answer to the first question is simple: The other side included thme LPAR. How does it do it? The frame itself contains the information. How did it get there? The other side put it there, of course.
So, in order to send data, you have to address it to a specific unit address on a specific control unit in a particular logical channel subsystem defined to a particular LPAR. Just four simple pieces of addressing information.
As you likely suspected, every frame also includes that same information about the sender, so that when the frame arrives at the CPC, the I/O subsystem will locate locate the control unit definitions for the destination LPAR and will search for the CUADDR that matches the sending LPAR.
In order for two LPARs to talk to each other, each must have a (sub)channel that is talking to a CTC control unit. Of course, CTC control units don't physically exist any more, so we talk about a logical CTC control unit image. This CU image is imbedded in the CTC firmware in the z Systems FICON adapters. Each CU image you create via the CNTRLUNIT UNIT=FCTC statement is bound to a specific LPAR in a specific channel subsystem. The CUADD parameter identifies the CSS and MIF id of the target LPAR.
It follows that two LPARs can talk to each other only when there are two CUs define, one bound to each of the LPARs. FCTC subchannels on each LPAR will be associated with the CU of the peer LPAR, which may be on the same CPC or a different one.
In this treatise I use the phrase "local LPAR" to mean the LPAR that you are trying to manage. A "remote LPAR" is one that is trying to talk to the local LPAR. As perspective moves, don't forget which is local and which is remote.
Hopefully Alan's Rules for FICON (ARF, for short) will help. To wit:

There's no crying in System Programming. As Tom Hanks said in 1992's A League of Their Own, It's supposed to be hard. If it wasn't hard, everyone would do it. The hard is what makes it great!
Leave everything you know about configuring ESCON or parallel channels at home. They are dead and buried. Requiescat in pace.
CHPIDs on both sides are coded as TYPE=FC. No worries about which end of the connection will play Channel and which will play Control Unit. It's magic.
A single physical path can communicate with multiple LPARs, whether on the same or different CPCs.
There are two types of configurations:

Unswitched. Two FICON ports are physically connected to each other by a single cable. This is also known as a point-to-point or direct connection.
Switched. Two FICON ports are logically connected to each other through a FICON switch. A switched connection is only way you can connect two LPARs on the same CPC with a single channel.

And before you ask, no, there is no ability to wrap the CTC inside a shared FICON port like you can with shared OSA ports. (That would be cool, though, wouldn't it?)
Your local LPAR's I/O configuration must have a separate CTC CNTLUNIT definition for every remote LPAR that it wants to talk to. (Don't ask questions, yet. Just accept it for now.) That control unit will be associated with the CHPID you use to reach the destination.
with each CHPID that you will use to communicate with that target LPAR. (Of course, for high availability you will always have at least two.) You can put up to 256 CUs on a single FICON chpid, so that shouldn't be a horrible burden. The CNTLUNIT identifies the target LPAR via CUADD=xy, where x is the CSS in which the partner's physical path is placed, and y is the partner LPAR's MIF image ID (CP QUERY LPAR can help you for same-CEC configurations). If CSS is 0, you can just code CUADD=y.
If the FICON chpid on the remote system isn't defaulted or defined as SHARED in the I/O configuration, then you don't have to specify CUADD on the CNTLUNIT.
In a switched configuration you need to code the LINK in order to direct outbound CTC traffic to the proper physical path on the other side of the switch. In a loopback configuration, the target LINK will be the same LINK as the origin.

SSI and ISFC

Define at least four (4) CTC devices between any two LPARs. Doing so provides ISFC with two logical paths, each composed of two CTC subchannels. Why do you need two subchannels? While technically not required for correct operation, know that a CTC is a half-duplex device. That is, it cannot send and receive data at the same time. By using two CTCs (subchannels) on the link, we turn it into a full-duplex configuration, with one CTC always receiving and the other always sending.
By adding a second pair of CTCs (preferably on another CHPID for availability), ISFC can now provide differentiated services. One logical link can be used for small messages, and the other for large messages such as are created during guest relocation.
The full configuration of 16 CTCs, 4 per physical path (CHPID) provides the bandwidth needed to absorb a reasonable number of relocations.

The Unbreakable Rule: Sending and receiving CTC device pairs MUST have the same UNITADD value on their respective IODEVICE definitions. This isn't the same as having the same IODEVICE number, but by default the UNITADD will be the same as the last two digits of the device number.
There is a special place in Hell reserved for sysprogs who change UAs such that they aren't consistent with the device numbers. So think carefully before you get carried away by your own brilliance and cleverness.

z/VM can simulate FICON channels via CP DEFINE FCTC. Just beware that a peer guest not being logged on (CTC isn't coupled), doesn't cause the same conditions on the virtual CTC as would appear if the partner LPAR were down.
With the above rules in mind, I recommend that you establish a device numbering convention that enables you to identify the target LPAR based on the device number (address). Try to avoid just slapping it together. Instead, think about the furture and your growth plans, because changing your numbering convention after you have it deployed causes a pain in your gluteus maximus. And the longer you use a bad convention, the more work it takes to convert. Do your homework!
Here's a possible numbering convention that eliminates the requirement for you to code any UNITADD values. The device numbers are ABCD, where

A_even are SEND channels and A_odd are RECEIVE channels. Some communications subsystem require separate SEND and RECEIVE channels (e.g. TCP/IP), and others can use a single CTC channel. ISFC on z/VM can use a single channle, but if two are provided, it will automatically use one for sending and the other for receiving. If one of the CTCs is lost, then ISFC will place the remaining CTC in full-duplex mode. If you don't want to worry about send/receive pairs, then there is no restriction on A.
BC is the target CTC Image Number. This number is a unique value that you create and assign to every LPAR that will be using CTCs. Keep this list where everyone can find it and keep it up to date.
D is the particular CTC pairing. It matches on both sides. That is, xxx0 on LPAR 1 talks to yyy0 on LPAR 2. This means you can have up to 16 CTCs between two LPARs. If you need more that 16 CTCs, you need a different naming convention!

This is illustrated in the following example, in which we establish a set of point-to-point connections between each of three LPARs named RUBY, SAPPHIRE, and EMERALD.
Connecting them is a FICON patch cable between CHPID 4A in CSS 0 and CHPID 5A in CSS 1, and another between CHPID 4B and 4C in CSS 0.
While excluded here for simplicity, LPARs should be connected by a pair of connections using different physical components so that communication won't be lost in the event of a single component failure.

.--------------------.______ .--------------------. | RUBY | | | EMERALD | | CSS 0 | CU 2 |<================| CSS 0 | | MIF ID 12 |______| | MIF ID 1 | | | /\ ______| | | | || | | | | |================>| CU 3 | | | | || |______| | '____________________' || /\ '____________________' || || || || || || || || || || || || || || || || || || || || || || || || || || || || || .--------------------. || || | | || || | | || || | SAPPHIRE | || || | CSS 1 | || || | MIF ID 7 | || || '____________________' || || | | || '===================>| CU 1 |<===================' |______|

First, let's define our three LPARs. been specified.

     RESOURCE PART=( (CSS(0),(RUBY,C)),
                     (CSS(1),(SAPPHIRE,7)),
                     (CSS(0),(EMERALD,1))      )
h

That should be familiar and fairly straightforward. If everything is in CSS 0, then the syntax is a lot simpler.

Next we need to define FICON CTC control units, one for each LPAR we want to talk to.

So let's look at RUBY's I/O configuration. You're thinking, "We'll need 2 control units." But that would be wrong because I want RUBY to he able to talk to herself. (Don't worry, we judge computers differently than ourselves.)

     CNTLUNIT CUNUMBR=0001,
              PATH=((CSS(0),4A)),
              UNIT=FCTC,
              CUADD=17         <--- SAPPHIRE is in CSS 1 MIF id 7
 
     CNTLUNIT CUNUMBR=0002,
              PATH=((CSS(1),5E)),
              UNIT=FCTC,
              CUADD=0C         <--- RUBY is in CSS 0 MIF id C
 
     CNTLUNIT CUNUMBR=0003,
              PATH=((CSS(0),4B)),
              UNIT=FCTC,
              CUADD=01         <--- EMERALD is CSS 0 MIF id 1

Note the CUNUMBRs. Their values are arbitrary, but don't waste them. In this example, they represent CTC image numbers 1 and 2, as these are the first two I defined. Now, what you don't want is a conflict with any other CU numbering convention you have, so choose wisely. is the target. But if you follow the model above, you would use CUNUMBR to specify the target CTC image number.

Be careful about MIF IDs. There are 16 per channel subsystem (CSS), so they will repeat when you have multiple CSSes, but may not be associated with the same LPAR.

Every LPAR will use the same device addresses to talk to the same LPAR.

When only two LPARs are used, the same device number can be used on the both sides to talk to other side. That's a derivative case and won't work if you have three or more LPARs.

Finally, we need devices to talk on. Every LPAR that wants to talk to SPOT, needs devices defined on SPOT's FCTC CU. It's as simple as adding the LPAR to the access list. Every LPAR uses C00-C0F to talk SPOT.

     IODEVICE ADDRESS=(0C00,8),
              CUNUMBR=0001,
              UNIT=FCTC,
              PART=((CSS(0),FLUFFY),(CSS(0),FIDO))

If all LPARs need to talk to SPOT, the you might prefer to use

              NOTPART=((CSS(1),SPOT))

to allow everyone except SPOT to have C00-C0F.

And here are the I/O devices that will be used by SPOT to talk to FLUFFY's C00-C0F. They are also C00-C0F. Not expandable, but very easy to understand and implement.

     IODEVICE ADDRESS=(0C00,4),
              CUNUMBR=0002,
              UNIT=FCTC,
              PART=((CSS(1),SPOT))

And we add another CU and set of devices.

FLUFFY already talks to SPOT on her C00-C07 devices. Just add FIDO to the access list so he, too, will use Cxx to talk to SPOT.

     IODEVICE ADDRESS=(0C00,8),
              CUNUMBR=0003,
              UNIT=FCTC,
              PART=((CSS(0),FLUFFY),(CSS(0),FIDO))

SPOT and FLUFFY will talk to FIDO via their B00-B07 devices.

     IODEVICE ADDRESS=(0B00,8),
              CUNUMBR=0003,
              UNIT=FCTC,
              PART=((CSS(0),FLUFFY),(CSS(1),SPOT))

SPOT and FIDO will talk to FLUFFY via their A00-A07 devices.

     IODEVICE ADDRESS=(0A00,8),
              CUNUMBR=0003,
              UNIT=FCTC,
              PART=((CSS(0),FIDO),(CSS(1),SPOT))

But, wait! Were'nt we using C00-C07 for SPOT to talk to Fluffy? Note that SPOT used to talk to FLUFFY on Cxx. No longer. :-( This is why it's important get the convention right the first time. As you can see, you can only have the same device numbers on both sides when you have only two LPARs. Once you add a third LPAR, it's no longer a pairing. You now have a set between FLUFFY and SPOT, SPOT and FIDO, and FIDO and FLUFFY. At some point you can no longer hold the relationships in your head. Write them down!

So let's go back and use our model.

     IODEVICE ADDRESS=(0C00,8),
              CUNUMBR=(0001),
              UNIT=FCTC,
              PART=((CSS(0),FLUFFY))
 
I hope this has been of help to you.  Contributions and corrections welcome!

You can infer or discover these rules from hints in various pubs and
redbooks (http://www.redbooks.ibm.com/redpapers/pdfs/redp0158.pdf), but
hopefully having them laid out helps.
 
  
  

  
The information provided, and views expressed on this site are my own and do not represent the IBM Corporation.