SimplyMerit is cloud-based compensation management software created by MorganHR that simplifies merit planning, bonuses, and equity grants with real-time collaboration, HRIS integration, and Total Rewards Statements. Most organizations go live in 5 days.

How does SimplyMerit integrate with our HRIS?

SimplyMerit integrates seamlessly with ADP, Paylocity, BambooHR, and other major HRIS platforms through secure API connections. Employee data syncs automatically to eliminate manual updates.

How long does SimplyMerit implementation take?

Most organizations are live in 5 days with our Live in Five implementation process. This includes setup, data migration, and team training.

Can SimplyMerit handle bonuses and equity grants?

Yes. SimplyMerit handles merit increases, bonuses, spot awards, promotions, equity grants, and other variable compensation types in one platform.

What makes MorganHR different from other compensation consultants?

MorganHR combines expert compensation consulting with SimplyMerit software. Every client gets ongoing access to our compensation consultants, not just a software helpdesk. We provide both strategic guidance and powerful tools.

AI Productivity Measurement: Insights and Data Trends

Estimated reading time: 6 minutes

In this article, we explore AI productivity measurement and its impact in today’s workplace.

I pulled the AI productivity measurement data from our Wrike system last week, expecting a clean win story. Twenty-seven AI-tagged tasks across our most recent compensation consulting projects. Planned hours: 150. Actual hours: 183. We ran 22% over plan, not under.

The headline looked like bad news for AI adoption on our own team. After two weeks with the data, I think it is the right number. It is also the most useful AI measurement story we have found so far.

The AI Productivity Measurement Headline Looked Bad

Our team had been tagging AI work in Wrike for several months before I sat down with the numbers. The tagging system captures six dimensions on every task that touches AI. Those dimensions include integration level,

Clean table showing the five key tags used in MorganHR’s AI productivity tracking system — Recommended five-tag system for AI productivity measurement

human-AI collaboration model, tool used, readiness barrier, skill development category, and conversion mechanism. Planned versus actual hours and fees are recorded alongside the tags. This post focuses on the five dimensions in which the data were most readable across the 27-task sample; the skill development category will receive its own analysis once the sample is larger. Our goal was simple. We wanted to measure AI productivity on our own US-based compensation team before recommending measurement approaches to clients.

When I aggregated the first 27 completed AI-tagged tasks, the gap was hard to miss. Planned hours totaled 150. Actual hours came in at 183. Across the board, our analysts and consultants were spending more time per task with AI, not less.

That number contradicted every assumption I walked in with. AI is supposed to save time. Vendor decks promise speed. Industry benchmarks report hours saved. Then I saw a 22% overrun on tasks where we were specifically using AI tools. My first instinct was to look for a measurement error.

There wasn’t one. The data was clean. Our team really was investing more hours per AI-touched task than we had planned. After a week of staring at it, I started asking a different question. What if “over plan” is the right outcome for this stage of AI adoption? And what if “saved hours” is the wrong measurement frame entirely?

One note on scope before I go further. These 27 tasks cover compensation consulting work. Categories include merit plan design, total rewards structuring, SimplyMerit deployment configuration, and client deliverable drafting. The findings reflect expert knowledge work, not high-volume transactional work. AI productivity measurement looks different in those two contexts, and I will come back to that distinction below.

Why Saved Hours Misses the Point of AI Productivity Measurement

Most AI productivity measurements I see published reports a single metric. Whether the source is vendor reports, consulting firm studies, or internal HR dashboards, the metric is hours saved. The Stanford AI Index Report 2025 documents the same pattern at a much larger scale: AI adoption is now widespread across organizations, yet productivity reporting still leans almost entirely on speed metrics, even as the underlying research points to quality and capability gains showing up earlier in the adoption curve. Saved hours are easy to count. Yet they only matter if they convert into something the business actually cares about. That means a change to financial outcomes, capacity, or risk exposure.

Saved hours convert into value through one of four mechanisms. The first is Speed: the same work, done in less time. Quality is the second: the same time invested, but a better deliverable. Capability comes third: work that the team could not have done before, now in scope. Last is Risk reduction: the same work, with lower downstream exposure to errors, compliance gaps, or rework.

Diagram of the four AI productivity measurement conversion mechanisms used at MorganHR: Speed, Quality, Capability, and Risk reduction. — MorganHR’s conversion mechanism framework for AI productivity measurement. Saved hours only matter if they convert into Speed, Quality, Capability, or Risk reduction.

Here is what most measurement frameworks miss. In expert knowledge work, Speed is usually the last mechanism to show up, not the first. Quality, Capability, and Risk reduction arrive earlier in the adoption curve because that is where AI changes the work itself. Speed comes only after the team has built reliable patterns for using AI on real tasks. Building those patterns takes deliberate time.

Our 22% overrun was the team doing exactly that work. They were drafting prompts, validating outputs, and comparing AI-generated language to client-tested language. The team also redid pieces that did not pass review and developed internal standards for what “finished” looks like when AI is in the loop. None of that work shows up as “hours saved.” All of it shows up as Quality, Capability, and Risk conversion, which most measurement dashboards do not track.

Three Patterns That Reframed How I Read the AI Productivity Data

Once I stopped reading the data as a speed report and started reading it as a conversion report, three patterns came forward. Each one surprised me. Together, they pointed toward measurement choices we would have missed if we had only tracked saved hours.

Pattern 1: Collaboration Model Beats Tool Choice in AI Productivity Data

Two collaboration models dominated the data. Tasks tagged “AI Drafts then Human Refines” ran 17% over plan. The same team used the same tools on tasks tagged “Human Outlines, then AI Expands, then Human Edits.” Those tasks ran 210% over plan. Same time signatures? Not at all. Most teams I talk to debate which AI tool to license. Based on our data, the more important variable is how humans and AI hand work back and forth. Handoff structure determines how much rework and validation the human has to absorb.

Pattern 2: “Ready to Adopt” Was Our Most Expensive Readiness State

Tasks tagged with no AI readiness barrier ran 235% over plan. By comparison, tasks tagged “Risk and Compliance Concerns” ran only 11% over plan. When the team felt no friction, they planned less rigorously, and the work expanded to fill the unprepared space. Caution forced better planning. Teams that flagged compliance concerns built in time for legal review, slowed down on prompt design, and ended up closer to the plan. By contrast, the team that felt comfortable with AI also felt comfortable underestimating, and the data caught it. This was the most counterintuitive pattern in the dataset. Since then, it has changed how I scope any AI-touched engagement.

Pattern 3: Most AI Work Was Not in the Original Plan

Across most projects, AI-tagged tasks were not in the original scope. They emerged during execution. That is not a planning failure on its own. Instead, it is a sign that AI-enabled work the team could not have scoped in advance. Picture the Capability conversion mechanism showing up in real data, meaning work that would not have existed without the tool. Most measurement frameworks miss this entirely. The reason is that they start from a planned-versus-actual frame and treat unplanned tasks as scope creep by default. Our data suggests that some of that “creep” is the actual value showing up in the AI productivity data.

What HR Leaders Can Borrow From Our AI Productivity Measurement Setup

If you are running AI productivity measurement in your own organization, three takeaways from our data may be worth testing against yours. First, do not stop at saved hours. Build the conversion mechanism (Speed, Quality, Capability, or Risk) into your tagging. That way, you can tell which kind of value showed up, not just whether time went up or down. Second, track the collaboration model separately from the tool choice. The handoff structure between human and AI is doing more work in the time data than the tool brand is. Third, watch your “no barrier” tasks. Comfort with AI correlated, in our data, with looser planning and bigger overruns.

A simple decision framework for HR Directors: when an AI-touched project goes over plan, do not default to “AI failed.” Ask three questions before drawing that conclusion. Did the deliverable get materially better? Has the team taken on work it could not have done before? Were we able to reduce a downstream risk? If yes to any of those, the value is real even when the hours went the wrong way. A “no” on all three indicates genuine scope creep, which is also useful information.

Honest AI productivity measurement is going to beat the optimistic version on every multi-quarter horizon. CFOs will not trust speed numbers that do not survive scrutiny. The fastest way to lose credibility on AI investment is to report a number that contradicts what people are seeing on the ground. Our 22% overrun is a number our team recognizes when they see it, which is what makes it usable.

Key Takeaways

AI productivity measurement that stops at saved hours misses where the value actually shows up first in expert knowledge work: Quality, Capability, and Risk reduction, not Speed.
Our team ran 22% over plan across 27 AI-tagged tasks. That overrun reflects investment in prompt design, output validation, and workflow patterns the team did not have before.
The collaboration model is a stronger predictor of time signature than tool choice. How human and AI hand work back and forth matters more than which AI brand the team uses.
“Ready to adopt” tasks ran 235% over plan in our data. Comfort with AI correlated with looser planning. Caution forced better scoping.
Tag every AI-touched task with a conversion mechanism so you can answer the CFO question: where did the value actually land?

Quick Implementation Checklist

Pick one AI-touched workstream to instrument first. Do not boil the ocean.
Add five tags to every task in that workstream. The tags are AI integration level, collaboration model, tool used, readiness barrier, and conversion mechanism (Speed, Quality, Capability, or Risk).
Record planned versus actual hours on every tagged task. Without that comparison, the conversion data has no anchor.
Wait until you have at least 25 completed tagged tasks before drawing any conclusion. Smaller samples will mislead.
Run the analysis monthly once the volume is there. Look for patterns by collaboration model and by readiness barrier, not just averages.
Report findings honestly. If the headline number is bad, lead with it and explain the conversion data underneath. CFOs trust numbers that contradict expectations more than numbers that confirm them.

Frequently Asked Questions About AI Productivity Measurement

AI Productivity Measurement Questions From HR Leaders

Q: How big a sample size do I need before AI productivity measurement gives me a real signal?

A: At least 25 completed AI-tagged tasks per workstream is a reasonable minimum. Below that threshold, individual outliers will distort the patterns. Above it, the collaboration model and readiness barrier patterns become readable. Importantly, the goal is not statistical certainty; the goal is enough data to challenge your assumptions.

Q: We do not have a project management system that supports custom tags. Can we still do this?

A: Yes, though it is slower. Build a parallel spreadsheet with the same five tag categories and update it weekly. Many teams start here before they invest in tooling. After a quarter of manual tracking, you will know whether the patterns are worth automating.

Q: What is the single most important field to add if I can only add one?

A: Conversion mechanism. Without a Speed, Quality, Capability, or Risk tag, you cannot interpret time data correctly. Specifically, an over-plan task with a “Capability” tag is a different story from an over-plan task with a “Speed” tag. That difference matters when you report to leadership.

AI Productivity Measurement Questions From Executives

Q: How do I explain a 22% over-plan number to my CFO without losing credibility on AI investment?

A: Lead with the number, not around it. Then walk the CFO through the conversion mechanism. Identify which AI-touched tasks delivered Quality improvement. Note which delivered Capability the team did not have before. Then identify which reduced the downstream risk. CFOs trust people who report data that contradicts expectations and explain it. They do not trust people whose numbers always confirm the budget pitch.

Q: Is AI productivity measurement going to look the same in expert knowledge work as in transactional work?

A: No. Transactional work includes claims processing, basic data entry, and ticket routing. That kind of work converts saved hours to Speed quickly because the work is well-defined, and AI replaces a clear step. Expert knowledge work (strategy, advisory, custom analysis) converts saved hours to Quality and Capability first, and to Speed last. Both are real AI adoption stories. They just unfold on different timelines.

Q: When should we expect to see Speed conversion in expert knowledge work?

A: Based on what we are seeing, Speed conversion in expert work typically appears 12 to 24 months into deliberate AI adoption, after the team has built repeatable patterns. The McKinsey State of AI 2025 survey reports a similar pattern at scale, with most organizations realizing quality and capability gains earlier than measurable speed gains. Until then, expect Quality and Capability gains and measure those instead. Otherwise, you will report an early-stage adoption story as a failure when it is actually working as expected.

AI Productivity Measurement Questions From Implementation Teams

Q: Our team is comfortable with AI. Should we still flag readiness barriers?

A: Especially if you are comfortable. In our data, “no barrier” tasks ran the furthest over plan. Comfort correlated with looser scoping. Force the team to name at least one barrier on every tagged task, even if it is a small one. The planning discipline that follows is where the time accuracy comes from.

Q: Does this approach apply to teams using SimplyMerit or other compensation administration platforms?

A: Yes, with one adjustment. Platform deployments have a more defined scope than open-ended advisory work, so Speed conversion tends to show up earlier. Still, the same four mechanisms apply. Tagging deployment tasks by conversion mechanism gives you a cleaner read on where AI is helping versus where the team is still building patterns.

Want to Compare Notes on Your Own AI Productivity Measurement?

If you are building AI work tracking on your own team and want to compare notes, reach out. I am happy to share the tagging schema we use internally and walk through what each field is meant to catch. Honest measurement is more useful when more teams are doing it the same way.

Contact the MorganHR team »

Author’s note: This post was drafted with AI assistance, edited by hand, and logged in the same Wrike tagging system referenced above. Conversion mechanism: Quality (the AI helped me structure a long-form argument I had been circling for two weeks). Time signature: roughly 4 hours actual against 1.5 hours planned. About what the data above predicted.