And Miles to Go Before I Sleep: The Air Force’s Stratification Problem


The woods are lovely, dark and deep,
But I have promises to keep,
And miles to go before I sleep,
And miles to go before I sleep.
-Robert Frost

When reflecting on how to move our conversation forward, I was struck by the need to both acknowledge the progress made thus far and thank the many airmen, past and present, who have taken the time to write me. Your insights have made me a better military professional. Many expressed optimism because the Air Force is taking steps to push formal decisions about an officer’s promotion potential a few years later. Under Gen. David Goldfein’s leadership, for example, the Air Force will delay officer consideration for intermediate and senior professional military education and is shifting from a single-year look to a multi-year window of consideration for promotion to brigadier general. There has also been a small increase in the number of officers promoted on normal timelines (rather than ‘below the zone’) to positions as wing commanders—currently a key stepping-stone on the path toward general officer consideration. Additionally, more general officers seem to be sensitized to issues of toxic leadership and are removing or quietly sidelining an increasing number of problem commanders. Most importantly, there is a growing consensus that the system does need reform, and more people are talking about it in constructive ways. All that being said, hold off on scheduling the ticker-tape parade and don’t pull the “Mission Accomplished” banner out of storage quite yet. There is a long way to go, and one can argue that it begins with overhauling the Air Force’s evaluation system.

In my last article, I discussed the need to measure what we value in line with the stated desire of Gen. Goldfein. I pointed to the Air Force’s core values of “integrity first, service before self, and excellence in all we do” as the place to start. I offered 360-degree evaluations as a potential option. However, after more thinking and interactions with airmen, I’d like to spend more time on describing the problem with the current evaluation system and offering some suggestions for an improved system — and less time advocating for specific formats.

In my first article, I highlighted the human tendency of senior officers to promote those junior officers who most closely mirror their values and attributes. Supervisors and commanders do this via an entirely subjective order of merit within a peer group called “stratification“ (for example, “My first of  20 captains” or “number four out of 15 squadron commanders”). Interestingly, the Air Force did not design stratification to be a part of the evaluation system. The stratification practice seems to have emerged to fill a void in the current evaluation system, which failed to draw adequate performance distinctions between officers. Consequently, the numerous Air Force panels and boards that convene to decide career-altering things — such as command candidacy, resident professional military education selection, and promotion — did not have an adequate means to make their decisions until commanders started ranking their subordinates. An officer in the Headquarters Air Force office responsible for personnel matters wrote a background paper that stated, “To date, no one can definitively pinpoint the genesis of stratification, but this methodology has spread so far that it is now (arguably) the #1 determinant of top AF officer talent at all levels.” So, in effect, the Air Force has a sole source subjective assessment as the basis for evaluating and managing its officer corps.

Advocates for the existing system highlight the additional rater (i.e., the supervisor’s supervisor who formally evaluates their subordinate and their subordinate’s subordinate) as a check to ensure that the rater is evaluating their subordinates fairly. Proponents also point to the need to trust commanders to make those determinations.  Many people also highlight the need to provide the Air Force board system some means to make its decisions about who makes the cut and who does not. These may appear strong arguments at first glance, however, consider this:  When commanders are relieved of command, what happens to the evaluations they have written? They continue in the system and continue to influence officer management for years to come because there is no mechanism to identify those records affected by relieved commanders.  This is not to say that outstanding performers cease to be outstanding simply because errant commanders rated them; however, it is equally true that one might rightly question decisions made commanders who have been relieved for unprofessional behavior or other examples of flawed character and judgments.

Here’s another question: How does the additional rater decide how to evaluate an individual or decide if the rater’s assessment is fair? While approaches vary from commander to commander, the additional rater’s evaluation is usually a combination of the subordinate’s strength of record up to that point and how strongly the individual’s supervisor advocates for their subordinate to the additional rater. Anecdotally, I can tell you as a graduated group commander, I never even met my additional rater.  Yet that general officer stratified me. The same thing happens at the wing level. Group commanders sit around the table and all but decide the order of merit of wing commanders’ stratification lists.  Why? For the simple reason that the wing commander does not know – arguably cannot know – the many airmen being discussed. Here’s a final question to ponder: If a strong leader (charismatic, empathetic, empowering, risk-taking, etc.) works for a weak leader (socially awkward, cold, micromanaging, risk-adverse, toxic, etc.), how confident can anyone be that the stratification is an accurate reflection of the subordinate? Whether it is a personality conflict or the senior officer feels professionally threatened, the assessment cannot be trusted. The bottom line is that stratification is problematic at best and just plain misleading at worst.

Instead of focusing on their airmen and the mission, many of our officers are preoccupied with stratification in an effort to join the “early promotion club.”  Commanders get creative with stratifications in an effort to improve the prospects of their subordinates. The members of boards and panels spend an inordinate amount of time trying to decipher stratification techniques and figuratively wring their hands trying to figure out what if anything happened when an officer’s stratification changes from second out of 20 in one year to fourth out of 25 the next year. Meanwhile, most of those commanders will tell you there’s little difference among their top 10 percent of officers, so the advantage usually goes to the individuals meeting a board sooner than their peers. There’s also no way to standardize pool size or the talent amongst the pools. The third-ranked officer in one pool may be better than the top ranked officer in another pool.

I can also tell you more than one rater has told their subordinate that, while he or she is the stronger officer, another officer is getting the better stratification because the other officer “needs it more” to be promoted. While not exactly merit based, this doesn’t sound that bad until you consider that all evaluations are a part of the permanent record. Consequently, those stratifications will affect those officers for the rest of their careers. What this means is that officer “A,” who was, in actuality, the top performer, now has average looking records and so is never afforded an opportunity to command or gets passed over for promotion. Officer “B” is the weaker performer but has the stronger record, which leads to command opportunities and promotion. It is an incredibly shortsighted practice and happens more often than you would think.

The Headquarters Air Force manpower directorate paper on stratifications captured these issues:

Research conducted by the student think tanks at Squadron Officer School in 2016 and 2017 indicated that most of the officers corps feels stratification is subjective, inflated, inconsistent, biased, and provides ineffective feedback, promotes a ‘halo effect‘, creates a perception of a ‘secret language’, and cross-[Air Force career specialty] incompatibility [sic].

Our junior officers understand the situation perfectly, so many choose, instead of keeping their focus on their subordinates, to aggressively pursue projects and opportunities that put them in close proximity to their leadership very aggressively in attempts to gain an advantage in the competition for stratification. These officers spend their time trying to look good up the chain of command because the current system rewards it. The Air Force needs to stop wasting time trying to fix stratification and focus on reengineering the evaluation system to better measure what we value. In the words of Peter Drucker, “There is surely nothing quite so useless as doing with great efficiency what should not be done at all.”

What then should the Air Force measure and how should they measure it? Some of the current thinking on reforming the evaluation process seems to be coalescing around an evaluation based on the four major graded areas that comprise the unit effectiveness assessments: managing resources, leading people, improving the unit, and executing the mission. This approach makes sense if it reflects what we value, but I think it more easily applies to commanders than to staff officers. There are also questions on how best to measure those things. If the rater does the evaluation in those areas subjectively, the Air Force has not addressed the issue of a potentially flawed single source for performance evaluation.

Lest I become too prescriptive, allow me to propose some concepts — some “must haves” — upon which the Air Force can build a better evaluation system. First, the evaluation should measure what we value because behavior will shift to maximize what the promotion system rewards. Additionally, what the Air Force expects in its field grade officers (majors, lieutenant colonels, and colonels) should be different from what it expects from its company grade officers (lieutenants and captains). Company-grade officers are learning how to be good followers, developing small-unit leadership abilities, and building their technical expertise within their assigned specialties. Field-grade officers, on the other hand, should be able to mentor, coach, and lead company-grade officers while shifting from tactical expertise to operational applications and strategic thought. If the Air Force does not distinguish between company and field grade officer expectations, the field graders will not likely change their behaviors. The Air Force will continue to determine general officer promotion potential based on what is arguably tactical performance. If you doubt that this is happening now, ask yourself why general officers continue to wear U.S. Air Force Weapons School patches on their uniforms — an artifact denoting technical excellence earned as a company-grade officer — a time approximately two decades in their past. Though some tactical experts mature into outstanding strategic performers, it is clear from the number of failed leaders we encounter, that the former does not always begat the latter.

Second, our evaluation should contain objective and quantifiable data that is indicative of officer effectiveness in a given area. The current evaluation system is entirely subjective — the opinion of one individual, at best two – with the first rater heavily influencing the second.  That opinion may be important, but reliably attainable and objective information for the entire population is more important. I contend the absence of this type of information is what has led to the development of, and overreliance on, stratification, as well as the disproportionate emphasis on performance in professional military education as an indicator of officer performance. The primary challenge here is to find the indicators of performance from authoritative sources that point to actual effectiveness, are difficult to manipulate, and do not conflate quantity with quality. The system must encourage members to focus on fulfilling their duty and not on gaming the system.

Third, and related to the second concept, the Air Force needs to assess its officers’ ability to lead. There are thousands of competing theories on leadership, and at the heart of most of them is the concept of trust. I think the question of trust has three main aspects: Do you do what you say you’ll do? Do you know what you’re doing? Do you put the interests of the team before your self-interest? If these seem to ring a bell, it is because they tie back to the Air Force core values. Other, more scholarly work suggests that the dimensions of trust are benevolence, competence, integrity, and predictability. Regardless, supervisors are not always in the best position to assess subordinate leaders; the officer’s subordinates are in the best position. The challenge here is to capture the perspective of the subordinates without turning leadership into a popularity contest. Leadership is about doing the right thing, not necessarily the popular thing. That being said, this truism should not be used as an excuse by supervisors to do as they wish without due consideration of those being led. It is a matter of trust. If you violate our airmen’s trust, you will be unpopular. Our airmen will support leaders who take the time to explain why something is being done even if they don’t personally agree. In the absence of a reason and trust, many will assume their chain of command is self-interested or lacks the courage to do the right thing.

Fourth, avoid zero-sum performance appraisal tools. Zero-sum assessment tools like stratifications or forced distributions tend to anchor people to a certain place without a realistic hope of changing their fortunes.  Just as there is a “halo” effect (i.e., the person can do no wrong) there is a “horns” effect (i.e., the person is below average or worse).  The Air Force needs a system its personnel can trust because it captures current performance objectively without the anchoring effect of previous evaluations. Supervisors need to capture and assess the intangibles that a purely quantitative assessment would miss, but these qualitative offerings should not become the sole (or even the main) source for the entire evaluation. Raters need to give a recommendation for command or staff positions. Any subjective categorization of performance relative to peers should be limited to discrete events (e.g., quarterly or annual awards).  The supervisors should hold their subordinates with supervisory responsibilities accountable for their performance evaluations, feedback and talent-management recommendations.   Those raters who make good recommendations (e.g., someone recommended for command later succeeds in that position) should have their superior judgment captured on their evaluations. Just as importantly, those who made bad recommendations (e.g., someone they recommended for command was fired) should have their flawed judgment noted on their evaluations. In other words, senior officers need to be accountable in a tangible way for the assessments of their subordinates. Such a system incentivizes candor, mentoring, and coaching.

The Air Force has taken some important but arguably small steps toward improving the officer evaluation and promotion system, but we are far from complete. Those steps were simple compared to the task we still confront — revamping the entire evaluation system. Half measures will not do it, and tweaks to stratification will not do it. We should put the lipstick down and step away from the pig. “Fixing” stratifications still leaves supervisors in the position of “passing judgment on the personal worth of subordinates” — an admonition from Douglas McGregor going back to 1957. The Air Force will know that the evaluation system is right when its members say it is, and we have a lot of work to do before that day arrives. I say “we” because if the Air Force does this right, the folks on the Air Staff will broadcast what they are thinking instead of working this in a vacuum. I can tell you that our airmen have some tremendous passion and great thoughts to contribute if the Air Force will tap into them. I know because they are writing me their thoughts in large numbers and have trusted me enough to share some of them. Thoughts not just about officer matters, but civilian and enlisted issues as well. Giving the force an opportunity to provide input will not only improve the result, it will also enhance trust – that vital foundation of all relationships to include the sacred one between the leader and the follower.


Col. ‘Ned Stark’ is an Air Force officer. His opinions are his alone and do not represent those of the U.S. Air Force, the Department of Defense, or any part of the U.S. government, but he hopes one day they will come closer.

Image: Staff Sgt. Jacob N. Bailey