Before surgeons perform knee replacements, they need to know exactly how the patient’s knee is aligned. That means manually placing markers on an X-ray of the patient’s full leg from hip to ankle and measuring the precise angles that will guide the procedure. It’s a task that must be done for every patient, takes roughly 15 minutes each time and must be completed by a surgeon whose time is among the most expensive in medicine.
William Anderst, Associate Professor of Orthopaedic Surgery at the University of Pittsburgh School of Medicine, knew this was a problem that technology could solve. He brought it to theUniversity of Pittsburgh Cloud Innovation Center (CIC), powered by AWS, and the result is an AI-powered tool that automatically identifies anatomical landmarks on knee X-rays and calculates the measurements surgeons need, with a mean error of less than one degree.
The Challenge: Manual Measurements, High Stakes
More than 700,000 knee replacement surgeries are performed in the United States every year. Before surgery, surgeons take a long-leg radiograph, a weight-bearing X-ray from the hip all the way down to the ankle and manually place markers at the center of the femoral head, the center of the knee joint and the center of the ankle. From those points, they draw lines along the femoral and tibial mechanical axes, measure the resulting angles and use those measurements to guide how they will align the prosthetic joint.
“It takes about 15 minutes per patient. With 10 or more surgeons doing this at UPMC alone, we’re talking about a significant amount of expensive time—and there’s variability from one surgeon to the next in where they place those markers,” says Anderst.
Anderst had a medical student spend an entire day manually measuring just 14 patients for a research study. “I thought, ‘My goodness, this is something that should easily be done by a computer,’” he says. “You’re finding the center of a circle and drawing a line tangent to two bones. This should be automatable.”
The specific measurements involved are called CPAK (Coronal Plane Alignment of the Knee) measurements: the Lateral Distal Femoral Angle (LDFA) and the Medial Proximal Tibial Angle (MPTA). Together, they classify how a patient’s knee is aligned before surgery, informing whether the surgeon should aim to restore the patient’s natural alignment or correct it to a neutral position.
The Solution: AI That Reads X-Rays
CIC student developer Gary Farrell developed an automated CPAK measurement tool built on a U-Net, a type of convolutional neural network well-suited for medical image analysis. The model takes a long-leg radiograph as input, analyzes both legs simultaneously by processing each side individually and outputs a set of heatmaps that identify where eight key anatomical landmarks are located on the X-ray.
From those predicted landmark locations, the system automatically draws the required lines along the distal femur and proximal tibia, calculates the LDFA and MPTA angles and produces a superimposed image that shows exactly where the measurements were taken. This, importantly, gives surgeons a quick visual check before accepting the output.
"We trained the model on over 300 manually annotated radiographs from the NIH Osteoarthritis Initiative," says Farrell. "One trick that really helped was flipping images of left legs to look like right legs during training, then un-flipping the predictions. It cut down the complexity of what the model needed to learn without sacrificing accuracy."
CIC student developer Eric Poplavsky proposed an AI-assisted annotation workflow the team used to scale up labeling. Rather than placing each landmark from scratch, annotations started from predictions made by previous model prototypes and were adjusted only as needed, significantly cutting the time required per image.
CIC student developer Matthew Lu built the training and evaluation pipeline that made rapid iteration on the model possible. Using AWS services including S3, SageMaker and Amplify, the pipeline pulls annotated radiographs from S3, runs the training script on SageMaker's GPU instances and automatically publishes an evaluation report to an Amplify-hosted website once training completes.
"Training the U-Net on a local machine took about an hour per run, which adds up fast when you're tuning hyperparameters or testing a new annotation batch," says Lu. "Moving the workload to SageMaker brought that down to around 15 minutes, and just as importantly, every run produces a shareable report the whole team can look at without having to dig through logs or rerun anything locally."
Each report surfaces the metrics that matter for a keypoint detection model at a glance: overall mean error, median error, RMSE and mean OKS, alongside Percentage of Correct Keypoints (PCK) thresholds from PCK@10 to PCK@50. A per-keypoint breakdown shows how the model performs on each of the eight anatomical landmarks individually, making it easy to spot which points are pulling overall accuracy down. The report also logs the full hyperparameter set and training job metadata, so any run can be traced back to the exact configuration that produced it.
The results speak for themselves: On the test dataset the CIC team used, the mean absolute error for the final CPAK angles was less than one degree.
For context, the target Anderst set for the project was replicating experienced user measurements within one degree. The tool meets that bar.
Built for Surgeons, Not Just Researchers
A key design principle for this tool was that surgeons would never have to take the output on faith. Rather than simply returning a number, the solution overlays the predicted landmark positions and measurement lines directly onto the X-ray, so a surgeon can glance at the output and confirm the computer didn’t do anything unexpected before accepting the result.

“They wouldn’t need to zoom way in on the joint to verify every point. They would see the image, say ‘Oh yeah, that’s about the hip center, that’s about the knee center—looks right,’ and move on,” says Anderst. “Surgeons want the measurement done. This gives them that, with a quick sanity check built in.”
The human-in-the-loop design also reflects a practical reality: Even a highly accurate model won’t be adopted if surgeons can’t trust it. By making the model’s reasoning visible, the tool earns that trust.
Broader Applications: Beyond the Knee
With more than 700,000 knee replacements performed annually in the United States alone, not to mention hip replacements and other joint procedures, the potential time savings from automating this step are substantial across the field of orthopaedic surgery. The same underlying approach could be adapted to other procedures that require manual measurement from medical imaging or otherwise.
Anderst also sees a path to publication: Open-sourcing a validated, high-accuracy tool that eliminates variability between users and dramatically reduces measurement time is the kind of contribution that can move the field forward.
Supporting Artifacts
Interested in implementing this solution? Explore the code, technical documentation, and demo on GitHub.
Have your own project idea? The Pitt Cloud Innovation Center accepts project proposals from University of Pittsburgh staff and faculty. Submit your idea today to see how cloud innovation can accelerate your work.
The University of Pittsburgh Cloud Innovation Center, powered by AWS, builds impactful, scalable solutions using cloud computing, artificial intelligence, and machine learning. The Pitt CIC delivers open-source proof-of-concept solutions that address real-world challenges across the university and beyond.
Learn more: digital.pitt.edu/cic