AUTOMATIC, REAL-TIME, SUPERIMPOSED LABELING OF POINTS AND OBJECTS OF INTEREST WITHIN A VIEW
The present invention relates to superimposing, onto an image, labels of points and objects appearing in the image. More particularly, the present invention relate s to automatically in real-time determining if a predetermined point or object of interest appears in an image and, if so, superimposing the corresponding label onto the image. It is known to determine location and orientation of a camera taking pictures. The registration of location for the purpose of easy browsing and retrieval of the pictures at a later time is described in U.S. Patent Nos. 5,506,644 to Suzuki et al. and 6,222,985 to Miyake
(hereinafter "Miyake"), the disclosure of the latter being hereby incorporated by reference in its entirety. It is also known to add direction/orientation information to pictures. For instance, U.S. Patent No. 5,614,981 to Bryant et al. (hereinafter "Bryant"), the disclosure of which is hereby fully incorporated by reference herein, proposes a camera with a magnetic orientation sensor. U.S. Patent 5,659,805 to Furlani et al. describes a camera for indicating camera orientations on photographic film, and U.S. Patent No. 5,699,444 to Palm describes how image data (known objects on the photograph of which the exact location is known) can be used to determine the orientation of the camera. U.S. Patent No. 4,183,645 to Ohmura et al. relates to detecting the orientation of the photo (portrait or landscape) in order to add the date and time in the appropriate orientation. As a further application regarding orientation, U.S. Patent No. 6,304,284 to Dunton et al. (hereinafter "Dunton"), whose disclosure is herein incorporated by reference in its entirety, uses orientation in formation to create panoramic images with a camera. Global positioning system (GPS) receivers which deliver real -time ground coordinate and altitude readings for the receiver can now be integrated into a single chip and are available to be built into all kinds of portable devices such as mobile phones and digital cameras, at low cost. There exists, furthermore, an observable trend of more and more digital photos and photo albums being shared on the Internet. What is needed is a way to utilize readily accessible location and orientation data real-time to enhance the experience of taking a photo. The present invention has been made to address the above -noted shortcomings in the prior art. It is an object of the invention to provide a device, a method operabl e on the device, and a computer program for performing the method, wherein a device usable with an image - viewing apparatus marks, automatically in real -time, an image captured from a viewpoint.
The image is bounded in two dimensions by a field of view at a particular orientation. The device communicates with a storage medium for storing location information of a point of interest and/or an object of interest. The device includes a processing unit configured to, automatically in real-time, update a value of at least one parameter that defines the capturing of the image from the viewpoint. The processor also determines, based on the location information and at least one updated value, whether the respective point or object is disposed within the field of view at said viewpoint at said particular orientation. Further included in the device is a label superimposing unit configured to, automatically in real -time, based on the determination that the respective point or object is so disposed, and under control of the processing unit, superimpose a label onto the image to identify the respective point or object. Details of the invention disclosed herein shall be described with the aid of the figures listed below, wherein: FIG. 1 is a schematic diagram of a real-time label superimposition system in accordance with the present invention; FIG. 2 is a how an image with superimposed labels appears to the user through the viewfinder of a camera in accordance with the present invention; FIG. 3 is a conceptual diagram of a two-dimensional field of view in accordance with the present invention; FIG. 4 is a flowchart of a process for superimposing a label for a point/object of interest within the viewed image in accordance with the present invention; FIG. 5 is depiction of a sub-process of mapping a stored shape to the viewed image in accordance with the present invention; FIG. 6 is a flowchart of a sub -process for refining label placement in accordance with the present invention using the sub-process depicted in FIG. 5; FIG. 7 is a flowchart of a method for retrieving labels and associated metadata to be displayed upon playback or re -presentation of recorded images; and FIG. 8 is a flowchart of a process for retaining label information for subsequent representation of the stored images. FIG. 1 shows, by way of illustrative and non -limitative example, a configuration for a real-time label superimposition system 100 according to the present invention. The real -time label superimposition system 100 can be realized, for example, as a modified form of the digital camera in FIG. 1 of the above-mentioned Miyake patent. The system 100 includes front-end optics 104, a view finder 108, a GPS unit 112, a liquid crystal display (LCD) monitor 116, an orientation sensor 120, a proces sing unit or processor 124, a user input device 128 and a world model 132, all connected on a data and control bus 136.
As in Miyake, the front-end optics 104 may include a taking lens comprising a focus lens and a variable magnification lens, a diaphragm, and an optical low pass filter, these being arranged in series . The view finder 108 has a front-end LCD 110, as in Miyake, but this LCD functions as a label superimposition unit. As such, the LCD 110 is preferably extended across the entire viewing lens of the view finder 108 to allow label superimposition anywhere in the image. The GPS unit 112 is similar to that in Miyake, continuously providing updates as to ground coordinates and altitude. LCD monitor 116 is optionally provided with the capability to display the superimposed labeling which has been embedded into the processing unit's video output to the LCD monitor. The orientation sensor 120 detects camera tilt and horizontal rotation, as by the use of micro-machined silicon ("MEMS") sensors . Such sensors are discussed in the Dunton patent. One sensor is preferably utilized to detect tilt and another to detect rotation. Alternatively, the tilt sensor in the Bryant patent of a conventional film camera can be implemented. It is noted here that, in addition to orientation, the zooming/magnification factor of the variable magnification lens in the front -end optics 104 must be known to determine whether a point of interest or object of interest is currently within the field of view of the camera 100, as discussed below in further detail. Processing unit 124 may include the Miyake strobe unit, charge -coupled device
(CCD) unit, image pickup unit, camera unit, display unit, process unit and output unit. Accordingly, the image emitted from the low -pass filter in the front-end optics 104, as well as being displayed in the view finder 108 after label superimposition, is incident upon the CCD unit within the processing unit 124. The latter is necessary for storage of the image as a digital photo and for display on the LCD monitor 116. The Miyake switch unit may be integrated into the user input device 128 of the present invention. Along with a release switch for taking photos, the switch unit includes a user-operable zooming/magnification factor selector. The user input device 128 preferably includes, in addition, a wheel, button or other mechanism for choosing among displayed labels in the image, and possibly other input means by which to filter, by classification, labels to be displayed. As an example of the latter, the user can filter tourist information (e.g. this is the Eiffel Tower), topographical information (e.g. Paris with 10,000,000 inhabitants, and so on). The user input device 128 may additionally feature a mechanism, such as a trackball, joystick or rocker switch, by which to navigate a cursor across the screen of the LCD monitor 116. This user interface can be utilized, for example, to limit concurrent display of labels to
points/objects of interest in the immediate vicinity of the curs or, so as to not overcrowd the image with labels. World model 132 is a database, which may be incorporated into the Miyake PC card. The database 132 includes points of interest and objects of interest 140, and optionally shapes 144 of the objects. Loading of the database 132 ahead of time is especially useful if the camera is integrated with a personal digital assistant (PDA). The database 132 may alternatively be accessed via a wireless connection to the system or camera 100 from an Internet service affording WiFi (IEEE 802.11 wireless Ethernet) communication. Such access is especially useful if the camera 100 is integrated with a mobile phone. As a further alternative, this "travel -guide" information can be loaded into another portable device such as a PDA, which can wirelessly connect to the camera 100 via a Body Area Network or Personal Area Network. Processing itself may also be off-loaded to a remote server (not shown) to reduce form factor and power requirements for the mobile image-viewing device. Updates from the GPS unit 112 and orientation sensor 120, for example, can be wirelessly transmitted to the server. The server determines whether objects/points of interest are within the field of view - and, if so, where. The server transmits this determined data back to the camera 100 so that the processor 124 can appropriately operate the view finder 108 to superimpose labeling. FIG. 2 portrays two exemplary views 204, 208 from a digital camera incorporating the real-time label superimposition system 100 of the present invention. The left-hand view 204 shows a label 212 having a stem portion 216 and containing a description 220 of an object of interest, namely the Matterhorn mountain. Other labels 224 are positioned near other respective objects or points of interest. An example of a point of interest is a particular location within a nature trail that affords a desirable view of a canyon. By manipulating the user input device 128, the user can switch to any of the other labels 224, thereby generating a descriptive label at the chosen location. Accordingly, the right-hand view 208 displays a label 228 that identifies Mont Blanc. In the case of the nature trail, for instance, the descriptive label may provide a number or other identifier that matches with a travel guide or hiking route. The auto-labeling function could be editable by the user by means of the user input device 128. Thus, the user could add information to labels or even create new labels such as "Home," "starting point of the trip," "my hotel room," and so forth. Non-descriptive labels only appear when more than one point/object of interest is concurrently displayed. Otherwise, if only a single point/object of interest is displayed, the only label appearing is the descriptive label. Alternatively, the user might override the default to allow more than one descriptive label, if applicable, to concurrently display.
The LCD 110 in the view finder 108, responsive to electrical signals, selectively obscures incident light to form the letters 232 "Mont Blanc" and the label outline 236. Optionally, the obscuring elements may be colored to form a colored label within the image 208. As an alternative to the use of LCDs, other light sources such as light -emitting diodes (LEDs) may be used in combination panels that are transmissive in one direction and reflective in the other direction to introduce into a common light path both the original image and the superimposed label. FIG. 3 conceptually depicts an example of a field of view in two dimensions in accordance with the present invention. Since the image 302 as shown in FIG. 3 is rectangular, i.e., viewable through a rectangular viewing lens of the view finder 108, the field of view is defined simply by a horizontal sector angle 303 and a vertical sector angle 304, and, more generally, by the focal length of the taking lens in the front -end optics 104 and by the horizontal and vertical dimensions 306, 308 of the viewing lens of the view finder 108. Such truncation of the image 302 to the dimensions 306, 308 may occur anywhere in the light path, starting at the front-end and continuing through the view finder 108, and is shown in FIG. 3 at a particular location merely for simplicity of demonstration. In effect, the field of view is defined by exterior walls 312, 316, 320 of a solid figure, in this case a pyramid 324, whose apex coincides with the viewpoint 328, such that the walls extend through a periphery 332 of the image 302. The apex or viewpoint 328 is the optical center of the taking lens in the front-end optics 104. A focused rendition of the image formed at a focal length beyond the viewpoint 328 is routed to the view finder 108, where a label may be superimposed according to the invention. That rendition is also made incident upo n a front surface of the CCD within the processing unit 124. The variable magnification or "zoom" lens in the front -end optics 104 features multiple different focal lengths, each entailing a different field of view. Although the field of view is defined in FIG. 3 by a pyramidal shape, implementation of the invention in, for example, binoculars would limit the field of view in accordance with a conical shape, e.g. to a single, uniform sector angle. The field of view, more generally, may be defined with respect to any arbitrary shape for the view finder 108. Based on current ground coordinates and elevation provided by the GPS unit 112, tilt and rotational angle provided by the orientation sensor 120 and represented by the arrow 336, the processing unit 124 can determine by means of known geometrical transformations which, if any, point/objects of interest 140 within the world model 132 are currently within the current field of view. For example, as shown in FIG. 3, an object 328 at world coordinates (X,Y,Z) is currently within the field of view and displayed in the view finder 108 as object image 332 at image or view finder coordinates (x,y). Accordingly, the label (not shown)
stored in the world model 132 for object 328 is automatically placed in real -time adjacent to the object image 332 to identify the object. Preferably, the world model 132 includes for the objects/points of interest 140 topographical data so that the system 100 accounts for the possibility of occlusion of the object/point behind an object in the foreground. Such modeling techniques are often utilized, for example, in on-screen portrayal of three-dimensional objects in computer graphics applications, such as computer animations. If the object/point is totally occluded, either the object/point is not labeled or, optionally, the stem on the label may be dotted to indicate occlusion. FIG. 4 sets forth one example of a real-time label -superimposing process 400 for a point/object of interest within the viewed image in accordance with the present invention. The device of the present invention may be incorporated into a camera, camcorder, binoculars or night- viewing equipment, for example. If the image -viewing apparatus is on (step S404), the viewpoint and orientation are read. Both of these parameters may be repeatedly updated synchronously or asynchronously and buffered for reading. In addition, the most -recently buffered zooming/magnification factor can be read, to determine the most recent output of the user-operable zooming/magnification factor selector (step S408). For each point/object of interest in the world model 132 that has not already been linked to a label (as will be explained), a decision is made as to whether the point/object is within the current field of view (FOV) at the current viewpoint at the current orientation (step S412). This database search may, alternatively, be limited to points/objects within the FOV that are approximately at a particular distance from the viewpoint. Thus, as in Miyake, a focusing sensor is typically provided, within the camera unit of the processing unit 124, for calculating a distance from the camera 100 at which image focusing occurs. Labeling can accordingly be limited to only those objects/points within the FOV that are also at that distance or within an interval centered at that distance. In any case, , the location within the image 302 is determined for each point object within the FOV, optionally inclusive of merely those disposed approximately at the calculated focusing distance. This determined location is used to update the average location, if any yet exists, for that point/object. An average location is desirable, to reduce noise which appears as random error in the readings from the orientation sensor 120 and the GPS unit 112. The randomness of the errors tends to make the errors offset each other when the readings are averaged. The appropriate label is then superimposed onto the image 302 for viewing by the user (step S416). As more and more readings for a given object/point are averaged, jitter in the label is mitigated and stability is achieved. When people take pictures, for example, it can easily take 5-10 seconds to find the right composition
and settings for the camera, whereas multiple readings for an average can be accumulated quickly, e.g., within a fraction of a second. Query is made by the processing as to whether the determined location for the object point, and correspondingly for the label, is stable, based for instance on a predetermined number of readings (step S420). If more readings are needed, the process repeats, starting at step S404. Preferably, the needed number of iterations is achieved for a single, point/object before processing a next point/object. If, on the other hand, stability has been achieved, a corresponding label is linked in fixed "on-screen" relation to tracked contours in the image 302 (step S424). Techniques in detecting contours in an image are well-known. Tracking can be implemented using the same methodology conventionally applied in motion compensation, examining candidate images within a predetermined window in a subsequent frame for the best-matching image. Once the label is linked to the tracked contours, location averaging is no longer needed. The label stems continues to p oint to the object point even if the camera 100 is moved by the user, as long as the point/object remains within the field of view. A separate process (not shown) can monitor linked labels, and drop the link whenever the point/object exits the field of vi ew. FIG. 4 represents merely an example of the specific design. Thus, for example, the query as to location stability (step S420) may, in addition to determining whether the average has been computed with the predetermined number of summand, also take into account detection by the MEMS sensors of recent significant camera movement. The repeated updating of parameters, as a further example, may include retention of respective rolling averages for the viewpoint and orientation, the rolling average having a set number of summands. The query as to location stability in step S420 would then be reduced to identifying when stability of the rolling average had not yet been achieved, e.g., when the camera has just been turned on or when the MEMS sensors detect re cent significant movement in the camera. In addition, the query as to existence of non -linked points/objects with the FOV in step S412 may, in a single iteration, determine all such points/objects. Likewise, superimposition of labels in step S41 , and label linking in step S424, would be performed in a single iteration. FIG. 5 illustrates a stored -shape mapping sub -process 500 for mapping to the viewed image a shape 504 stored among the object shapes 144 in the world model 132 in accordance with the present invention. The sub-process 500 reduces systematic error in the orientation and GPS readings, and may be executed between steps S420 and S424 of the real-time label- superimposing process 400. The shape 504 of the object of interest 508, in this case th e "Toronto Tower," is, as shown in FIG. 6, first mapped to the average location determined in step S416 once the location has been determined to be stable in step S420. Then, a process
similar to that described above for contour tracking may be performed. Examination is made within a search window of the mapped image for a best match, the search window being of predetermined size (step S608). The average location is then updated by the translation determined to align the mapped image with the best matchi ng candidate image (step S612). The corresponding label placed to point to the updated location is then linked in step S424, thereby reducing systematic error. A preferred embodiment not only minimizes random and systematic error in the parameter readings, but reduces the storage/processing overhead by limiting the world model database 132 to data specific to the objects/points of interest. In particular, the database 132 need not include data throughout three-dimensional space to enable determination as to whether occlusion of the point/object exists. It suffices to model merely the points/objects of interest to include characteristics such as, for example, shape, color and shading, these being stored for multiple possible orientations or alternatively i n real-time extracted from a three- dimensional model. This expands upon the concept introduced in FIG. 6 of storing shape information. As in the FIG. 6 embodiment, once the object/point is determined to be within the FOV, known image recognition techniques can be employed to decide whether the object/point is sufficiently occluded so as to warrant dotted labeling or no labeling at all, depending on the implementation. More specifically, after execution of the location averaging sub-step (in step S416), query is made as to whether the point/object is visible in the image 302. This query involves retrieving the stored characteristics for the point/object and comparing them within a window of the image 302, that window being centered at the average location in a manner similar to that of step S608. The comparison may involve, for instance, merely a contour based on the retrieved characteristics as matched against a detected contour within the window. It is also noted that the image characteristics to be ma tched need not be exactly co-located with the point/object, but may, instead, lie within a vicinity of the point/object. In any event, if a predetermined level of pixel-to-pixel matching is not attained, the object/point is judged to be not visible in the image. Otherwise, if the match is close enough, the average location is updated as in step S612, not only reducing systematic error, but, in addition, accounting for occlusion or non-occlusion of the object/point. This approach may be further modified, as mentioned above, by retention of respective rolling averages for the viewpoint and orientation. Despite paring down the world model 132 by the above -described method, the extra storage and processing overhead for the characteristic data may suggest off-loading, from a mobile device, the world model and much of the image processing to the aforementioned
server. With respect to the image processing, the camera/binoculars 100 may transmit the averaged parameter readings to the server (step S408). The server determines potential points/objects of interest that are included within the FOV, not taking into account occlusion. For each point/object, it determines how the object would look in the image 302, based on the received parameter readings. At least partial images are transmitted back including approximately where in the image 302 they are located. The above -described pixel-to-pixel matching is then applied. To prevent a wireless -bandwidth bottleneck in the event of multiple currently- viewed points/objects of interest, the images transmitted back can be limited to particular camera-requested categories. Video captured by a camcorder, for example, can be downloaded to a personal computer (PC) that efficiently identifies points/objects of interest while playing back the video. As shown in FIG. 7, the image-viewing apparatus, such as a camcorder, retains, in storage, contours that are linked to a label in step S424 (step S704). The stored video and contours are downloaded to the PC (step S708), or may be physically transferred as by means of a PC card. As the video plays back, the contour(s) are matched to the playback image to identify the corresponding points/objects of interest. Such identification also makes available for presentation any metadata that has been stored in connection with the contour, such as the descriptive label or an identification based on GPS and a mapping database as to the town or other location of the user at the time the video was recorded. Although a delay may exist between the time a contour or set of contours first appears on the video and the time at which the real-time label superimposition system 100 "locks" onto the contour(s) in step S424, the matching by the PC on playback identifies the object point as soon as the appropriate contour(s) appear. Advantageously, the labeling in the video playback may therefore first occur for an image recorded earlier than the image for which the labeling first appeared in the view finder of the camcorder. The contour retention and matching also applies to the digital photos stored for a digital camera such as that described with respect to FIG. 1. As a photo is taken, the contour(s) for visible points/objects of interest, and associated metadata, are retained for subsequent matching of the contours to photos displayed on the PC. Once a match is made, the metadata is displayed on-screen, superimposed on or alongside the respective photo. In this way, vacation photos are much easier to relive. Referring to FIG. 8, even the need to rematch contours can be eliminated, by storing the updated label location in the frame every time the video or photo frame is stored. In particular, since the label is linked, with respect to location, in fixed relation to the currently tracked contour(s) (step S424), and since the image coordinates of the currently tracked
contour(s) are known, the image coordinates of the label are known and this information, along with any description, may be embedded in the frame to be stored. When the stored photos or video are replayed on the PC, these stored coordinates are read and used to correctly superimpose the stored labels. Simultaneously, any other metadata stored with the frame can be accessed for visual display, aural presentation as by means of a speech synt hesizer, etc. Query is first made as to whether a video or photo frame is to be stored (step S804). If so, the label location which may have been updated (step S612) and is linked in fixed relation to the contour(s) (step S424) is stored along with the frame to be stored. In particular, an identifier of the label location, and any metadata, may be stored as part of the frame or in linked relation to the frame (steps S808, S812). During re -presentation or playback of the frame on the PC, for example, the PC superimposes the label based on the identifier and performs any further presentation based on the metadata. This playback technique may also be applied, for example, to superimpose labeling in the LCD monitor 116 concurrently with image/video capture, in which case, the frame, label location(s) and description are stored in a buffer. As has been demonstrated above, a user -friendly adjunct to a digital camera, binoculars or other image-viewing apparatus automatically provides in real time labels to identify known points/objects of interest. An enhanced image -viewing experience is afforded, particularly for the tourist. While there have been shown and described what are considered to be preferred embodiments of the invention, it will, of course, be understood that various modifications and changes in form or detail could readily be made without departing from the spirit of the invention. For example, the user may be provided with an actuator on the user input device 128 by which to introduce onto or exclude from the stored digital photos labels displayed in the view finder. It is therefore intended that the invention be not limited to the exact forms described and illustrated, but should be constructed to cover all modifications that may fall within the scope of the appended claims.