ADR & Algorithms

The multizone inspection of titanium forgings has a mathematical basis for deciding whether or not a part is acceptable.

Titanium aircraft engine disk forgings are imaged at all material depths using the ultrasonic testing (UT) process called multizone inspection. Ultrasonic multizone inspection consists of an ultrasonic system set up in pulse-echo mode, with transducers focused at specific depths, or zones, beneath the material’s surface. The system captures C-Scan images of the zones, in which each pixel contains the maximum reflected amplitude within the zone for a location on the surface of the test part. The test part is inspected in multiple zones, and a C-Scan image is produced for each zone. During multizone inspection, the transducer is focused so that the beam width does not vary greatly from zone to zone, causing image spatial resolution to be relatively constant for each C-Scan image1.

Titanium is a particularly difficult metal to inspect due to the large grain micro-structure found in most titanium alloys. Because of this, images have widely varying levels of brightness on which to locate areas that potentially exceed the material acceptance criteria. This large grained micro-structure found in the pre-forged material, called billet, is compounded by the forging process, which molds, folds, and perturbs the micro-structure into an extreme case of background inhomogeneity. The result is a set of images that is very difficult for consistent human analysis.

Nevertheless, the inspector must identify and measure areas that may exceed the acceptance criteria. These areas are called indications. During a manual process, the inspector first locates a potential indication and defines a signal region of interest (box) and a noise region of interest around the indication. These regions of interest are used to measure the indication’s signal to noise ratio (SNR), defined by the formula on the following page, where Ps and Pn are the peaks in the signal and noise regions, respectively, and μn is the mean of the noise region.

The operator tries to capture a noise region that exemplifies only homogenous noise similar to that found near to the indication. If the SNR exceeds a set threshold, he flags the indication and does not accept the part; otherwise, he moves on.

Because the image background is naturally inhomogeneous, drawing a repeatable homogenous box for the noise region is very difficult. While the region’s mean is affected less by this problem, the region’s peak is a much more variable statistic. When dealing with marginal indications, that is indications with an SNR very near to the acceptance threshold, the noise region can easily determine whether the part is accepted or not, as the two manually drawn noise boxes in Figure 1a indicate. This issue is addressed by an assisted defect recognition (ADR) algorithm that classifies all automatically detected potential indications based on carefully designed rules that consistently define its noise region. With this algorithm in place, the multizone inspection of titanium forgings has a mathematical basis for deciding whether or not a part is acceptable.


Algorithm Overview
The objective of forging ADR is two-fold: to automatically detect potential indications hidden within the UT C-Scan images, and to classify these indications by their SNR level. Since the ADR algorithm contains a classifier designed to filter out potential indications, the detection portion of the algorithm is tuned for a high overall rate to ensure no potential indications are overlooked. The detection portion of the algorithm is based, heavily, on the Dynamic Threshold algorithm by Howard, et al.2, which is used for UT billet ADR. It works by generating a model of the C-Scan background amplitudes, and comparing it to those of the actual image. Because of forged titanium’s homogeneity problems, the algorithm can be tuned for very high probability of detection, but produces a relatively high false positive rate as well. Therefore, the detection results of the Dynamic Threshold are fed into an SNR classifier called Auto-SNR, which carefully determines a noise region to isolate only the indications that violate the SNR threshold by specification. The algorithm details can be found in Ferro and Howard3.

Given the output of Dynamic Threshold and Auto-SNR, with sample results shown in Figure 1b and Figure 1c, ADR can calculate a final count of indications and their relevant statistics, such as SNR and peak amplitude.


Validation Results
There are two sets of images that define the validation data for ADR. The first is the set of 85 production and simulated images with indications, called the ADR database. Twenty-five production images are from parts that either contain real rejectable indications (highly reflective grains or voids) or flat bottom holes. The remaining images have indications that have been simulated, meaning that image processing has been done to superimpose a rejectable indication shape onto an otherwise clean background C-Scan of a forging. Simulating images to supplement the production images was necessary to increase the development set size for the algorithm, since production images with rejectable indications are inherently sparse due to the fact that very few defects are created in the forgings. The rejection criterion for this study is an indication with SNR greater than 2.5. Some of these images from this standard database have indications whose SNR measures above 2.5 and some do not. The ground truth SNR for these images was determined by manually scrutinizing each image based on the input of ultrasonic Level III inspectors. The output for the ADR software on these images contains the segmented indications and their respective SNRs.

The second set of images that defines the validation data for ADR is the set of 8,894 production inspection images used to rigorously test ADR for false positive rate (FPR) and probability of detection (POD). Twenty-four part numbers were used from various commercial engine lines, implying the algorithm has been tested for varying titanium alloys (Ti-17, Ti-6-4, Ti-6-2-4-2) and varying forging sizes. A total of 247 serial numbers were tested.

The validation procedure for all 247 serial numbers was carried out identically. A data image was presented to the ADR algorithm, which returned the number of indications it detected along with their SNR values. If indications were detected (i.e. ADR’s SNR was above 2.5), the operator then drew manual signal and noise boxes as he normally would, given the specification. If the operator’s SNR was above 2.5, then this was considered a true positive. If the operator’s SNR was below 2.5 (i.e. operator thought it was not an indication but ADR did), then this was considered a false positive. The operator then scanned the image for any more potential indications (i.e. indications that ADR did not find) and if his measured SNR was above 2.5, this was considered a false negative. The table that describes this procedure is shown in Figure 2. Essentially, ADR was measured against the operator such that operator results were considered ground truth.


Table 1 and Table 2 show the results of this validation procedure, separated into the in-house analysis and the production analysis performed at the inspection supplier. It is important at this point to introduce the concept of relevant and non-relevant indications. Some indications that are identified as rejectable by the ADR software or an operator may be later classified as non-relevant. A non-relevant indication in ultrasonic testing is one that is not caused by an internal feature of the test part but rather by an external reflector, such as dirt in the coupling medium, instrument noise, air bubbles on the surface of the test part, or geometric features such as edges or corners on the test part. The material specifications are only concerned with rejectable indications caused by internal conditions such as voids, inclusions, or large grain colonies, which are referred to as relevant indications.

In addition to false and true positives, Tables 1 and 2 break the results down further into relevant and non-relevant indications. Table 1 indicates that there was a total of seven times an indication was identified by ADR as rejectable where the operator measured an SNR less than 2.5, but that all of these indications were caused by external features and therefore non-relevant. Likewise, Table 2 shows that the ADR software correctly identified 1,081 rejectable indications, and of that total, 85 of those indications were relevant.


As a side effect of the ADR validation, it was observed that ADR greatly reduced the cycle time of evaluating a C-Scan. On average, it took an operator 100 seconds to evaluate an image, which involved manually scanning the image for a potential SNR indication, drawing signal boxes around each candidate, and drawing homogenous noise boxes for each. ADR on average took 4 seconds to run and for the operator to make a decision whether or not the image passed.


Conclusion
The results shown in Table 1 and Table 2 make it easy to calculate ADR’s empirical POD and FPR, since a population of nearly 9,000 images constitutes a statistically significant sample, as demonstrated using the procedure in Bradley and Longstaff4. From Table 1, the FPR in terms of relevant indications found in this study was zero. Considering all signals, one image in 1,280 can be expected to contain a false positive, albeit non-relevant due to edge effect, instrument noise, etc. From Table 2, the POD (considering all true positives) is 100%, since out of 1,082 total indications all of them were detected.

A screen shot of the software is shown in Figure 3. It is a plugin built on GEIT’s Rhythm Review image analysis package. Key features are the ability to perform ADR on a suite of images within multiple parts at once, image analysis productivity tools, standard data retention and storage in DICONDE format5, and an automatically-generated report for each part inspected.

Since there were no relevant false positives during the 24-week validation study using this software, no parts were sent for further engineering evaluation because of ADR. In fact, ADR has realized – due to its low FPR – a drastic reduction in cases needing further evaluation while providing confidence in the SNR outputs, which saves in inspection cost. However, it is the very high POD that renders it a certifiable means for production ultrasonic inspection of titanium forgings.


GE Aviation
Cincinnati, OH
geaviation.com


REFERENCES

1. P. J. Howard, D. C. Copley, E. J. Nieters, J. D. Young, M. E. Keller, and R. S. Gilmore, “Ultrasonic Inspection of Cylindrical Titanium Billet,” in American Society of Nondestructive Testing Fall Conference, 1994.
2. P. J. Howard, D. C. Copley, and R. S. Gilmore, “The Application of a Dynamic Threshold to C-Scan Images with Variable Noise,” in Review of Progress in QNDE, 17B, edited by D. O. Thompson and D. E. Chimenti, Plenum Press, New York, 1998, pp. 2013-2019.
3. A. F. Ferro and P. J. Howard, “Assisted Defect Recognition for the Ultrasonic Inspection of Titanium Forgings,” in Review of Progress in QNDE, 25A, edited by D. O. Thompson and D. E. Chimenti, American Institute of Physics Conference Proceedings, Melville, New York, 2009, pp. 627-633.
4. A. P. Bradley and I. D. Longstaff, “Sample Size Estimation Using the Receiver Operating Characteristic Curve,” in IEEE Proceedings on the 17th International
Conference on Pattern Recognition, IEEE Computer Society, 2004, 1051-4651/04.
5. P. Howard, L. Arrowood, M. Jobst, and J. Hansen, “A Standard Practice for Digital NDT Data Exchange and Storage,” in Materials Evaluation, ASNT, 2010, pp. 319-325/03.

May June 2011
Explore the May June 2011 Issue

Check out more from this issue and find your next story to read.