Properties and Performance
of a Center/Surround Retinex Daniel J.Jobson,Zia-ur Rahman,Member,IEEE,and Glenn A.Woodell
Abstract—The last version of Land’s retinex model for human vision’s lightness and color constancy has been implemented and tested in image processing experiments.Previous research has es-tablished the mathematical foundations of Land’s retinex but has not subjected his lightness theory to extensive image processing experiments.We have sought to define a practical implementation of the retinex without particular concern for its validity as a model for human lightness and color perception.Here we describe the trade-off between rendition and dynamic range compression that is governed by the surround space constant.Further,unlike previous results,wefind that the placement of the logarithmic function is important and produces best results when placed after the surround formation.Also unlike previous results,wefind best rendition for a“canonical”gain/offset applied after the retinex operation.Various functional forms for the retinex surround are evaluated,and a Gaussian form found to perform better than the inverse square suggested by Land.Images that violate the gray world assumptions(implicit to this retinex)are investigated to provide insight into cases where this retinex fails to produce a good rendition.
I.I NTRODUCTION
O F THE MANY visual tasks accomplished so gracefully by human vision,one of the most fundamental and approachable for machine vision applications is lightness and color constancy.While a completely satisfactory definition is lacking,lightness and color constancy refer to the resilience of perceived color and lightness to spatial and spectral il-lumination variations.Various theories for this have been proposed and have a common mathematical foundation[1]. The last version of Land’s retinex[2]has captured our atten-tion because of the ease of implementation and manipulation of key variables,and because it does not have“unnatural”requirements for scene calibration.Likewise,the simplicity of the computation was appealing and initial experiments produced compelling results.This version of the retinex has been the subject of previous digital simulations that were limited because of lengthy computer time involved and was implemented in analog very large-scale integrated circuits (VLSI)to achieve real-time computation[3],[4].Evidence that this retinex version is an optimal solution to the lightness problem has come from experiments posing Land’s Mondrian target,randomly arranged two-dimensional(2-D)gray patches, Manuscript received June26,1995;revised May24,1996.The work of Z. Rahman was supported by NASA Langley Research Center under Contract NAS1-19603.The associate editor coordinating the review of this manuscript and approving it for publication was Prof.Moncef Gabbouj.
D.J.Jobson and G.A.Woodell are with NASA Langley Research Center, Hampton,V A23681-0001USA(e-mail:d.j.jobson@v).
Z.Rahman is with Science and Technology Corporation,Hampton,V A 23666USA.
Publisher Item Identifier S
1057-7149(97)00428-4.
(a)
(b)
Fig.1.Spatial form of the center/surround retinex operator.(a)3-D repre-
sentation(distorted to visualize surround).(b)Cross-section to illustrate wide
weak surround.
as a problem in linear optimization and a learning problem for
back propagated artificial neural networks[5],[6].
The utility of a lightness–color constancy algorithm for
machine vision is the simultaneous accomplishment of:
1)dynamic range compression;
2)color independence from the spectral distribution of the
scene illuminant;
3)color and lightness rendition.
Land’s center/surround retinex demonstrably achieves the
first two,although Land emphasized primarily the color con-
stancy properties.Well-known difficulties arise,though,for 1057–7149/97$10.00©1997IEEE
Fig.2.Demonstration of retinex color constancy and dynamic range compression (prior to optimizing rendition)for a Gaussian surround with small space constant (15pixels).
color and lightness rendition [1],[3],[6].These consist of i)lightness and color “halo”artifacts that are especially prominent where large uniform regions abut to form a high contrast edge with “graying”in the large uniform zones in an image,and ii)global violations of the gray world assumption (e.g.,an all-red scene)which result in a global “graying out”of the image.Clearly,the retinex (perhaps like human vision)functions best for highly diverse scenes and poorest for impoverished scenes.This is analogous to systems of simultaneous equations where a unique solution exists if and only if there are enough independent equations.
The general form of the center/surround retinex (Fig.1)is similar to the difference-of-Gaussian (DOG)function widely used in natural vision science to model both the receptive fields of individual neurons and perceptual processes.The only extensions required are i)to greatly enlarge and weaken the surround Gaussian (as determined by its space and amplitude constants),and ii)to include a logarithmic function to make subtractive inhibition into a shunting inhibition (i.e.,arithmetic division).We have chosen a Gaussian surround form whereas Land opted for
a function [2]and Moore et al .[3]used a different exponential form.These will be compared in Section II.Mathematically,this takes the
form
th color
spectral band,
is the surround function,
and
(2)
where
JOBSON et al.:CENTER/SURROUND RETINEX
453
(a)
(b)
(c)(d)
Fig.3.Examples of serious photographic defects due to spectral and/or spatial illumination variations.(a)“Green”kitchen due to fluorescent illumination.(b)Sodium vapor illumination.(c)Tungsten indoors/daylight outdoors.(d)Obscured foreground.
The need for dynamic range compression and color con-stancy,especially if both are accomplished simultaneously by a simple real-time algorithm,is well known to photographers.Discrepancies between the photographer’s perception through the viewfinder and the captured film image can be quite bizarre (Fig.3),and require constant vigilance to avoid impossible lighting situations and to carefully select the appropriate film and processing for the illuminant’s spectral distribution.The fundamental limit [3]is recognized to be the film or cathode ray tube’s (CRT’s)narrow dynamic range and static spectral response.Print/display dynamic range constraints of 50:1are,however,compatible with the magnitude of scene reflectance
variations.Except for extreme cases (snow or lampblack)reflectance variations are only 20:1[7]and often much less.Thus,even the extremes of reflectance
of
2000:1)[8]
set by the detector array electronics,and an even higher dynamic range within the detector array pro
per,since the limiting factor is usually the preamplifier noise added in transferring image signals off-chip or digitization noise added subsequently.Therefore,at least for electronic still cameras,
454IEEE TRANSACTIONS ON IMAGE PROCESSING,VOL.6,NO.3,MARCH
1997
Fig.4.Demonstration of improved rendition obtained applying the log response after surround formation(c3=80pixels).
we can conclude that sufficient dynamic range is available to retain the full variations of both illumination and reflectance in arbitrary scenes.So it is certainly reasonable to consider either analog[3]implementations of compression/constancy or digital implementation if the initial A/D conversion is done at10–14bits(b),rather than the usual8b.
Recent advances in high-speed computing led us to re-consider both extensive digital simulations of the retinex and real-time digital implementations for practical use in future electronic camera systems.The hours of computer time previously reported[3]are now reduced to minutes and real-time implementations using specialized digital hardware such as digital signal processing(DSP)chips seem reasonable. In other words,the full image dynamic range is available from current electronic cameras,real-time computation is realizable,and the ultimate bottleneck is only at thefirst print/display.Obviously,there are image coding aspects to both dynamic range compression and color constancy.We will touch upon these briefly but concentrate primarily on the design of the algorithm to produce combined dynamic range compression/color constancy/color–lightness rendition.
We have seen that the center/surround retinex is both color constant and capable of a high degree of dynamic range compression.It remains,then,to specify an implementation that produces satisfactory rendition and examine alternatives to determine if other design options are equally good or better.Because the retinex exchanges illumination variations for scene reflectance context dependency[9],scene content becomes a major issue especially when it deviates from regionally gray average values—the“gray world”assumption [1].Therefore,testing with diverse scenes,including random ones,is important to pinpoint possible limits to the generality of this retinex.
Initial image processing simulations revealed the following unresolved implementation issues:
1)the placement of the log function;
zia2)the functional form of the surround;
3)the space constant for the surround;
4)the treatment of the retinex triplets prior to display. These will now be explored more comprehensively.The results of testing the optimized algorithm on diverse scenes will then be presented with special emphasis on“gray-world”violations.Finally,the relationship of the algorithm to neuro-physiology will be examined briefly.
II.I SSUES
A.Placement of Log Function
Previous research[3],[6]has largely concluded that the logarithm can be taken before or after the formation of the surround.Processing schemes[3],[6],[10]adhering closely to natural vision ,an approximate log photoreceptor response,favor placing log response at the photodetection stage prior to any surround formation.Our preliminary testing of this produced rather disappointing results and prompted us to reopen this seemingly decided issue.Initial testing of the postsurround log produced encouraging results with much less emphatic artifacts.Mathematically,we have
that
(6) are not equivalent.The discrete
convolution
, whereas the second term in(5)is a weighted sum.This is closely related to the difference between the arithmetric mean and the geometric mean except
that
(7) which does not produce exactly
the numbers as the geometric mean would.Since the entire purpose of
JOBSON et al.:CENTER/SURROUND RETINEX
455
Fig.5.Comparison of three surround functions—inverse square,exponen-tial,and Gaussian,normalized to equal full-width half-max(FWHM)response. The log(r)scale is necessary for comparison purposes but does diminish the differences between the functions.A linear r scale(if it were graphically feasible)would show very dramatic differences.The space constants are c1= 50pixels,c2=72pixels,and c3=60pixels.
the log operation is to produce a point by point ratio to a large regional mean value,(5)seems the desired form and our image processing experiments bear out this preference.A typical example is shown in Fig.4.While the halo artifact for(6)can be diminished by manipulation of the gain and offset,this results in a significant desaturation of color.In other examples,more severe color distortions occur,which likewise cannot be removed by manipulation of the gain/offset. In addition,a shadow simulation indicates much less dynamic range compression for(6).Therefore,we have selected the(5) form for our testing and optimization.This form is also that given in Land’s original presentation[2],though he is quoted as feeling the two forms were equally useful in practice[6].
B.The Surround Function
Land proposed an inverse square spatial
surround
(8)
where
which can be modified to be dependent on a space constant
as
(10)
because it is an approximation to the spatial response of analog
VLSI resistive networks,and Hurlbert[6]investigated the
Gaussian:
of visual angle.This corresponds to FWHM of about
270visual pixels(assuming a visual pixel
is).
We examined the performance of the Gaussian surround over
a wide range of space constants.Since previous research[6]
found variations in the space constant with the spatial variation
in shadow profiles,a particular concern is the question of
an optimum space constant that gives good performance for
diverse scenes and lighting conditions.
The image sequence(Fig.7)established a trade-off that has
not been previously studied.In varying the space constant from
small to large values,dynamic range compression is sacrificed
for improved rendition.The middle of this range
(50