WO2004104520A1 - Method of operating a voice-controlled navigation system - Google Patents

Method of operating a voice-controlled navigation system Download PDF

Info

Publication number
WO2004104520A1
WO2004104520A1 PCT/IB2004/050706 IB2004050706W WO2004104520A1 WO 2004104520 A1 WO2004104520 A1 WO 2004104520A1 IB 2004050706 W IB2004050706 W IB 2004050706W WO 2004104520 A1 WO2004104520 A1 WO 2004104520A1
Authority
WO
WIPO (PCT)
Prior art keywords
geographical
voice
dialog
recognition
user
Prior art date
Application number
PCT/IB2004/050706
Other languages
French (fr)
Inventor
Carsten Meyer
Original Assignee
Philips Intellectual Property & Standards Gmbh
Koninklijke Philips Electronics N. V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Philips Intellectual Property & Standards Gmbh, Koninklijke Philips Electronics N. V. filed Critical Philips Intellectual Property & Standards Gmbh
Priority to EP04733066A priority Critical patent/EP1631791A1/en
Priority to JP2006530859A priority patent/JP2007505365A/en
Publication of WO2004104520A1 publication Critical patent/WO2004104520A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/36Input/output arrangements for on-board computers
    • G01C21/3605Destination input or retrieval
    • G01C21/3608Destination input or retrieval using speech input, e.g. using speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context

Definitions

  • the invention relates to a method of operating a voice-controlled navigation system.
  • the invention relates to a voice-data user interface for a navigation system, a navigation system with a voice-data user interface of this kind, and a computer program in order to execute the method on a processor of a voice-data interface of a navigation system.
  • the invention relates to a method of generating a geographical database for use in the said method in order to operate a voice-controlled navigation system.
  • the user interface generally comprises a keyboard for inputting the location data.
  • location data hereby is geographical data on any locations, areas, buildings, roads etc.
  • the more convenient navigation systems are equipped, alternatively or in addition, with a voice-data user interface, with which the user can communicate in natural language. Since a voice-data user interface enables the hands-free operation of the particular equipment, the controlling of navigation systems in motor vehicles in this manner is to be preferred from the safety aspect. The driver can operate the navigation system during the journey without having to take his hands off the vehicle's steering wheel in order to do so.
  • a spoken response expressed by the user e.g.
  • the word list currently active may also be compiled as a function of recognition results of previous spoken responses within the dialog by the user.
  • recognition results of previous spoken responses within the dialog by the user One example here would be that the user has already input, in a previous stage of the dialog, that the destination is located in the federal state of North Rhine-Westphalia. For a voice recognition of the user's spoken response to the subsequent input request "In what town is your destination?" it is then sufficient for all names of towns within the federal state of North Rhine-Westphalia to be included in the word list.
  • different recognition hypotheses that have been determined during a voice recognition of a spoken response by the user may be evaluated, using the geographical database, by means of the geographical criteria taken into account in the generation of the previous prompt.
  • An evaluation of this kind can also take place as a function of recognition results of spoken responses by the user previously and/or subsequently within the dialog.
  • a geographical database with data entries that each have assigned to them one or more markers representing a type of the data entry concerned is particularly preferred.
  • a geographical type of data entry would be, for example, whether the data entry concerned relates to a country, a federal state, a town or a large conurbation, or also, in which federal state a town is located, etc.
  • the markers may also represent a geographical hierarchy level. Using these markers, a restriction of the database for further steps can be accomplished considerably faster, and/or a word list can be extracted more quickly or post-processed more effectively, since searching is restricted to entries with specific markers, wherein the type of marker, e.g. the current hierarchy level or the currently queried geographical type, is defined for a recognition or evaluation of a specific spoken response by the previous prompt or dialog stage.
  • Fig. 2 shows a dialog block diagram to explain one possible dialog sequence between a user and the system in accordance with the invention.
  • a voice recognition device 6 which pre-processes the incoming spoken responses S, processes them and supplies recognition hypotheses EH at an output. These recognition hypotheses EH are then further processed in an analysis unit 7 so that the contents of the spoken responses - for instance commands or location details - can be understood.
  • a dialog generally begins - following a normal activation, e.g. by a voice command or by manual operation of the equipment - by the dialog manager 3 outputting a prompt output command PB to the prompt generator 5 in order for a particular prompt P to be output to the user.
  • the generation of this prompt P takes account of specific geographical criteria GK, which are predetermined in the dialog program or which the dialog manager 3 can retrieve from the geographical database 8.
  • data entries DE e.g. the names and further geographical data on countries, regions, federal states, towns, streets, significant sights, full addresses, etc.
  • the data entries DE may hereby be entered in different ways into the database 8.
  • the individual data entries DE may each contain markers M, which indicate the geographical category or the type to which the data entry DE is assigned, such as ⁇ country>, ⁇ federal state>, ⁇ town>, administrative district of a town>, etc. or ⁇ small town>, ⁇ large conurbation>, ⁇ town of over 1 million inhabitants>, etc.
  • the database may also be hierarchically organized and/or divided into different parts.
  • the town is queried or, if applicable, a particular region is queried in an intervening hierarchical step.
  • the administrative district may be queried in the case of larger towns, and finally, at one of the lower stages, the street name and a house number, or a particular building, etc.
  • the dialog sequence is not strictly hierarchically structured from large down to small geographical units per se, being relatively flexible.
  • a dialog sequence of this kind may, in certain circumstances, i.e. under good recognition conditions, reach the destination in fewer steps than a dialog sequence with a strictly hierarchical structure.
  • the dialog control unit 3 firstly selects, for example, a prompt "To which town do you want to travel?". Then, if applicable, a word list with all town entries available in the database 8 will be compiled. To the extent that no further restrictions have been undertaken previously, this will, of course, be a relatively long list.
  • all data entries DE in the database 8 that are located in the vicinity of the large conurbation sought can then be extracted. If applicable, all data entries DE that fulfil the condition of being located near the recognized large conurbation may also be marked in a first step.
  • the new word list is then compiled, containing all towns that fulfil the condition. If the spoken response of the user to the previous query as to the desired town has been stored, it is now possible to undertake a voice recognition for this first spoken response once again with the restricted word list in order to arrive at a better recognition result.
  • the dialog manager 3 may also induce the prompt generation device 5 to output the first prompt "To which town do you want to travel?" once again and then to undertake the voice recognition of the subsequent spoken response with the restricted word list.
  • the invention is not restricted to the above-described embodiment examples - in particular the precise structure of the voice- data user interface or the precise sequence of the explained dialogs - but may be varied to a large degree by a person skilled in the field without exceeding the bounds of the invention.

Abstract

A method of operating a voice-controlled navigation system (1) is described, in which, within an automatically conducted dialog, taking account of geographical criteria (GK), input requests (P) are generated and output to a user, and responses spoken by the user (S) are detected. The spoken responses (S) are analyzed for recognition of location data using an automatic voice recognition method, taking account of the geographical criteria (GK). In addition, a corresponding voice-data user interface for a navigation system is described.

Description

Method of operating a voice-controlled navigation system
The invention relates to a method of operating a voice-controlled navigation system. In addition, the invention relates to a voice-data user interface for a navigation system, a navigation system with a voice-data user interface of this kind, and a computer program in order to execute the method on a processor of a voice-data interface of a navigation system. Furthermore, the invention relates to a method of generating a geographical database for use in the said method in order to operate a voice-controlled navigation system.
Modern motor vehicles are increasingly using navigation systems. Navigation systems of this kind enable the user to work out routes to particular destinations and to be guided along the route during the journey. To be able to offer these functions, navigation systems use geographical data containing inter alia information on geographical areas, towns, locations, buildings, streets, junctions, favorable journey times along specific sections of road, speed limits on roads, etc. Using this geographical data, a navigation system is able to find the optimum route, i.e. the shortest and/or fastest one, from a start point to a particular destination. The start point and/or the destination can be input by the user via a suitable user interface. Alternatively, the start point - to the extent that it is the current start point - can also be determined using automatic position-determining equipment, e.g. GPS, in the case of some navigation systems.
The user interface generally comprises a keyboard for inputting the location data. Deemed to be "location data" hereby is geographical data on any locations, areas, buildings, roads etc. The more convenient navigation systems are equipped, alternatively or in addition, with a voice-data user interface, with which the user can communicate in natural language. Since a voice-data user interface enables the hands-free operation of the particular equipment, the controlling of navigation systems in motor vehicles in this manner is to be preferred from the safety aspect. The driver can operate the navigation system during the journey without having to take his hands off the vehicle's steering wheel in order to do so. In the case of use of a voice-data user interface of this kind, a spoken response expressed by the user, e.g. specifying a location or giving a command, is detected as a voice signal by means of a microphone. The spoken response is then sent to a voice recognition device so that the location or command can be recognized and passed on, in machine- readable form, to the control device of the navigation system. Voice recognition systems generally operate in a manner such that the spoken response (also referred to below as the voice signal) is firstly analyzed spectrally or time-wise, and the analyzed voice signal is then compared, section by section, with different models of possible signal trains with known voice information. To this end, the voice recognition system is generally equipped with an entire library of different possible signal trains. Using the comparison of the received voice signal with the available models, the model that best matches a particular section of the voice signal is selected in order to reach a recognition result. The probability with which each model matches the associated section of the voice signal is hereby normally calculated. During the analysis and calculation of the probability of how well the individual models match the relevant sections of a voice signal, grammatical and/or linguistic rules are generally consulted. This avoids the possibility that the individual sections of the longer voice signal simply match well in isolation with the particular models available, and ensures that the context within which the sections of voice signal exist is taken into account in order that a useful recognition result is arrived at overall, thereby reducing the error rate. However, one problem with the use of voice recognition systems in navigation systems consists in the fact that location data, i.e. the designations of towns, federal states, streets, buildings etc., will often involve proper names, the spelling and pronunciation of which may be extremely unusual. The recognition result could be improved by making available to the voice recognition system complete models of all possible location data as a restricted, active vocabulary. Owing to the huge number of possibilities, however, a restriction of the vocabulary of the voice recognition device completely globally to all location data that could possibly arise is not practicable. On the other hand, the problem exists, particularly in the case of voice-data user interfaces for navigation systems, that the inputting of spoken responses generally has to take place under unfavorable conditions, i.e. with a relatively large amount of background noise. To this extent, additional conditions for improvement of recognition quality during voice recognition would be extremely helpful. It is an object of the present invention to specify an improved method of operating a voice-controlled navigation system, and a corresponding, voice-data user interface for a navigation system, which, in a simple manner, increase the quality of voice recognition. This object is achieved by a method of operating a voice-controlled navigation system in which input requests are generated within an automatically conducted dialog, taking account of geographical criteria, and output to a user, responses spoken by the user are detected and the spoken responses are analyzed for recognition of location data using an automatic voice recognition method, taking account of the geographical criteria. In terms of equipment, the object is achieved by a voice-data user interface for a navigation system with an output device for outputting input requests to a user, with a voice-inputting device for detecting spoken responses by the user, a dialog-control device for controlling a dialog with the user, taking account of geograpliical criteria, a prompt- generation unit for generating input requests, taking account of geographical criteria, a voice recognition device and an analysis unit for analyzing the detected spoken responses for the recognition of location data, taking account of geograpliical criteria, and a geographical database and/or a data interface for accessing a geographical database, which makes available geographical criteria and/or geographical data for the dialog-control device and/or the prompt-generation device and/or the voice recognition device and/or the analysis unit. Control of the dialog sequence by means of the dialog-control device hereby takes place using a dialog writing system stored in a special dialog-writing language in the system. This may be any dialog-writing language. Examples of usual languages are method- oriented programming languages such as C or C++ or so-called hybrid languages, which are declarative and method-oriented, such as Voice XML or PSP HDDL, which are languages with a structure similar to HTML, the language generally used to write Internet sites. Control hereby takes place essentially through the outputting of corresponding input requests, generally also known as prompts, to the user.
Using the automatically conducted dialog, the desired destination is determined iteratively in multiple query steps. As the individual prompts are output within the automatic dialog to take account of geographical criteria, these geograpliical criteria can be used accordingly, within the voice recognition method for recognizing a spoken response following the particular prompt, to restrict the active vocabulary or to evaluate language hypotheses, etc. As a result, the recognition results in the individual stages of the dialog are considerably improved, which leads overall to an extremely reliable recognition of the correct location.
A generation of the corresponding input request may, for instance, take place in a manner such that a particular input request is selected from a group of possible input requests already specified. In the same way, the input request may also be completely newly generated as a function of the particular position within the dialog sequence.
A prompt of this kind can, in principle, be generated in any manner by the prompt-generation unit and output to the user, e.g. in written form on a display or similar. Preferably, however, the output takes place in spoken form. This has the advantage that the user can register the prompt while simultaneously continuing to observe the traffic, as a result of which the operation of the navigation device becomes even safer. Both a spoken and a simultaneous visual output are also possible. Used for the spoken output may be, for instance, a voice synthesizer (text-to-speech converter) to convert the prompts output from a text form into a spoken form. To the extent that pre-prepared prompts are used, these may also be stored in an audio database. It is also possible for the prompt-generation unit to compile a prompt section-by-section from prepared audio data, for instance particular parts of a sentence, hereby also generating individual parts by means of a voice synthesizer where applicable.
The dependent claims contain especially advantageous embodiments and further embodiments of the invention. The voice-data user interface in accordance with the invention may hereby be further developed by analogy with the claims relating to the method.
There are various options for using the geographical criteria on the basis of which a prompt has been generated, when a subsequent spoken response is recognized. In a preferred embodiment, using the geographical criteria taken into account in the generation of the input request, a word list is compiled, which serves as a restricted active vocabulary during the voice recognition of the subsequent spoken response by the user. So, for example, on generation of an input request "In which federal state is your destination located?" (wherein, in this example, it is assumed that this is a navigation system with a geographical database currently in use that covers the whole of Germany), a relatively short word list, comprising only the names of all federal states in the country, would be used in the voice recognition of the subsequent spoken response by the user.
Alternatively or in addition to this, the word list currently active may also be compiled as a function of recognition results of previous spoken responses within the dialog by the user. One example here would be that the user has already input, in a previous stage of the dialog, that the destination is located in the federal state of North Rhine-Westphalia. For a voice recognition of the user's spoken response to the subsequent input request "In what town is your destination?" it is then sufficient for all names of towns within the federal state of North Rhine-Westphalia to be included in the word list.
Similarly, recognition results of the user's spoken responses subsequently within the dialog may also be used to restrict the active vocabulary for a renewed recognition of spoken responses already made by the user, in order to improve the recognition or to enable it in the first place. One example here would be a dialog in which a prompt "In what town is your destination?" is firstly output. To the extent that the voice recognition cannot then provide a reliable recognition result, for instance because none of the recognition hypotheses has an adequate confidence level, a prompt such as "What large conurbation is in the vicinity?" may be output in the next stage of the dialog. A restricted word list with large conurbations can then be used for the spoken response that follows the second prompt. The recognition result from this query can then serve for the compilation of a word list comprising only towns located in the vicinity of the recognized large conurbation for a further attempt at recognition of the spoken response to the first prompt. This kind of repeat recognition of previous spoken responses stored as audio data can - if desired - also take place as a matter of course. This kind of restriction of the active vocabulary, as a result of which only specific recognition results are permitted, is known as a "hard" restriction.
Alternatively or in addition to this, different recognition hypotheses that have been determined during a voice recognition of a spoken response by the user may be evaluated, using the geographical database, by means of the geographical criteria taken into account in the generation of the previous prompt. An evaluation of this kind can also take place as a function of recognition results of spoken responses by the user previously and/or subsequently within the dialog.
This subsequent evaluation may take place in a "soft" form, in which an "n- best list" is compiled, comprising a particular number "n" of the most probable recognition hypotheses in an ordered sequence. In evaluating the hypotheses for compiling the n-best list, it is ensured that, as regards the geographical criteria, the recognition hypotheses are consistent with previous and/or subsequent recognition results and/or the geographical criteria of the input request. An n-best list of this kind is preferably also generated if the active vocabulary has been restricted previously. Otherwise, an evaluation may also take place according to "hard" exclusion criteria so that the active vocabulary is quasi restricted subsequently.
Particularly if the data requested from the user within the dialog does not concern clearly defined facts, such as the name of a federal state, a country etc., but relates to "soft" criteria, such as the size of a town, or if it involves the relationship of different geographical locations to one another, such as the proximity to a large conurbation, then considerations of geographical criteria of these kinds in the form of evaluations of the recognition hypotheses are often more useful than hard exclusion criteria, such as restriction of the active vocabulary. A combined utilization method for the geographical voice recognition criteria within one and the same dialog is also perfectly possible. For instance, for the voice recognition of a spoken response to a first prompt, the active vocabulary may be restricted, and, for the voice recognition of a spoken response to the second prompt, the geographical criteria may be used for evaluating the recognition hypotheses. For the voice recognition of a specific spoken response, both a restriction of the active vocabulary as regards certain criteria, and, in addition, an evaluation of recognition hypotheses on the basis of various other criteria may also take place.
There are also various options for the selection of the dialog sequence itself. For instance, a dialog may, in principle, be strictly hierarchically structured according to geograpliical criteria, i.e. prompts hierarchically structured according to geographical criteria are generated within the dialog sequence. A typical example here would be, firstly, an enquiry as to the country in which the destination is located, then, for example, as to the federal state if applicable, then a region, and finally the town, followed by the street, wherein the area is further narrowed down at each stage and, accordingly, only the responses that are possibilities in this area are compiled into word lists.
In an alternative procedure, the input requests as to geographical criteria are generated within the dialog as a function of recognition results of previous spoken responses by the user. For example, an enquiry could be made as to the nearest large conurbation if, in a first step, the recognition result of a response to the enquiry as to the destination was unsatisfactory. If, on the other hand, e.g. the town in which the destination is located has been clearly recognized in the first step, the street may be queried immediately in the next step of the dialog.
When structuring a dialog according to hierarchically structured geographical criteria, it is also possible to use, in addition, recognition results of previous spoken responses in order to determine the further steps within the dialog sequence. A typical example is the case where the federal state "Berlin" has been specified in response to the query as to the federal state in which the destination is located. In the subsequent input request, rather than asking for the town within the federal state, it would be more useful to enquire about e.g. the administrative district of the town in which the destination is located.
With both methods, there preferably exists the possibility that - if the user cannot answer the particular question, e.g. the question as to an administrative district of the town of Berlin or about a particular region within the federal state - the dialog step can be skipped by means of an appropriate response, such as "unknown", or can be replaced with a different enquiry that also narrows down the area.
The geographical database used within the dialog, e.g. for compiling a word list and/or for evaluating a recognition hypothesis, is preferably restricted, as far as possible, on the basis of a previous input request and/or a previous recognition result of a spoken response to certain data entries. By restricting the database in previous steps for the subsequent steps, an extraction of the appropriate word list can take place considerably more quickly, since the number of data entries that have to be searched in order to compile the word list is correspondingly smaller.
Further, the use of a geographical database with data entries that each have assigned to them one or more markers representing a type of the data entry concerned is particularly preferred. A geographical type of data entry would be, for example, whether the data entry concerned relates to a country, a federal state, a town or a large conurbation, or also, in which federal state a town is located, etc. The markers may also represent a geographical hierarchy level. Using these markers, a restriction of the database for further steps can be accomplished considerably faster, and/or a word list can be extracted more quickly or post-processed more effectively, since searching is restricted to entries with specific markers, wherein the type of marker, e.g. the current hierarchy level or the currently queried geographical type, is defined for a recognition or evaluation of a specific spoken response by the previous prompt or dialog stage.
The dialog control device, the prompt generation device, the voice recognition device and the analysis unit may each be software components implemented on suitable hardware, for instance a processor of a voice-data user interface of a navigation system. It is hereby not absolutely essential for the voice-data user interface to be equipped with its own processor for this purpose, instead the voice-data user interface may also share the use of a processor that is also used for the remaining functions of the navigation system. It is, in particular, pointed out in this connection that the navigation system does not necessarily have to be a structural unit, but that it is also possible to realize various components of the system on various items of equipment that are connected to one another. This applies, in particular, to the voice-data user interface itself, the components of which may also be implemented on spatially separated processors. Thus, for instance, a voice recognition device could be implemented with an analysis unit on a particularly high-capacity server on the Internet, and be connected via a data connection to the further components of the navigation system, located in, for instance, a motor vehicle of the user.
Since the dialog control device, the prompt generation device, the voice recognition device and the analysis unit may each take the form of software modules, it is possible to retrofit existing navigation systems equipped with an appropriate output device, such as a loudspeaker and/or a display, with the voice-data user interface in accordance with the invention. All that is necessary is for the system to be equipped with a voice input unit, e.g. a simple microphone, and for an appropriately powerful processor to be available or for appropriate connections to a powerful processor to exist. Access to a geographical database exists per se in a navigation system, since this requires a database for calculating routes. The database may be stored on a mass storage device located on the navigation system, such as a CD. However, it may also be interrogated via a network, e.g. the Internet.
The database should preferably be modified in advance as part of a pre- processing stage. For example, the markers, which represent the type of the database entry concerned and/or a geographical hierarchy level and/or other geographical features, e.g. the position on a river, etc., may hereby be assigned to the respective individual data entries. In addition, the database may also be sorted hierarchically and/or relationships between the individual database entries may be established, and geographical criteria thereby determined. Geographical criteria of this kind may be stored at a separate location in the database or be contained directly in the database entries. For instance, a database entry "Eilendorf near Aachen" simultaneously also contains the relationship between two towns. With a database of this kind, prepared for use in accordance with the invention, a navigation system in accordance with the invention becomes more fast and efficient.
The invention will be further described with reference to examples of embodiments shown in the drawings, to which, however, the invention is not restricted. Fig. 1 shows a schematic diagram of the system architecture of one embodiment example of a navigation system in accordance with the invention.
Fig. 2 shows a dialog block diagram to explain one possible dialog sequence between a user and the system in accordance with the invention.
In principle, the navigation system 1 shown in Fig. 1 may be a conventional navigation system equipped with all components that are normally components of a navigation system in order to guarantee the necessary functionalities. For the sake of simplicity, these components of the navigation system 1 are shown here in just one single block 13. For communication with the user, the navigation system 1 is equipped with a voice-data user interface 2 in accordance with the invention, the components of which are shown in greater detail in Fig. 1. One component of this voice-data user interface 2 is an input/output interface
10, to which a voice inputting device 11, here a microphone 11, and a voice output device 12, here a loudspeaker 12, are connected. Via the microphone 11 , the voice-data user interface 2 can detect spoken responses S by the user. Via the loudspeaker 12, the voice-data user interface 2 can output prompts P, e.g. in order to induce the user to make a spoken response S.
One further important component of the user interface 2 is a voice recognition device 6, which pre-processes the incoming spoken responses S, processes them and supplies recognition hypotheses EH at an output. These recognition hypotheses EH are then further processed in an analysis unit 7 so that the contents of the spoken responses - for instance commands or location details - can be understood.
The voice-data user interface 2 is further equipped with a prompt generation unit 5 by means of which the prompts P to be output to the user are generated. Responsible for controlling the dialog between the voice-data user interface 2 and the user by means of the prompts P output by the system, and taking account of the spoken responses S by the user input in response, is a dialog control device 3 (also referred to below as the dialog manager 3), which controls the dialog in accordance with a predetermined dialog program. To this end, the dialog control device 3 is connected to the prompt generation unit 5, the voice recognition device 6, the analysis unit 7 and the input/output interface 10. As a result, the dialog control device 3 may transmit, for example, a prompt generation command PB to the prompt generation device 5, thereby inducing it to output a specific prompt P. As soon as a spoken response S is detected by the microphone, the dialog control device 3 is informed via the input/output interface 10, and sends a start command AS to the voice recognition device 6 and the analysis unit 7. A further component of the voice-data user interface 2 that is significant for the invention is a geographical database 8. This database 8 is represented here as a component integral to the voice-data user interface 2. This may, however, also perfectly well be a general geographical database of the navigation system 1, which is used - possibly only in part - by, among others, the voice-data user interface 2 of the navigation system 1. This operating mode of a voice-data user interface 2 structured in this manner is explained below by reference to the dialog block diagram shown in Fig. 2:
A dialog generally begins - following a normal activation, e.g. by a voice command or by manual operation of the equipment - by the dialog manager 3 outputting a prompt output command PB to the prompt generator 5 in order for a particular prompt P to be output to the user. The generation of this prompt P takes account of specific geographical criteria GK, which are predetermined in the dialog program or which the dialog manager 3 can retrieve from the geographical database 8.
In this database 8 are located data entries DE, e.g. the names and further geographical data on countries, regions, federal states, towns, streets, significant sights, full addresses, etc. The data entries DE may hereby be entered in different ways into the database 8. For instance, the individual data entries DE may each contain markers M, which indicate the geographical category or the type to which the data entry DE is assigned, such as <country>, <federal state>, <town>, administrative district of a town>, etc. or <small town>, <large conurbation>, <town of over 1 million inhabitants>, etc. As an alternative or in addition, the database may also be hierarchically organized and/or divided into different parts. For a territory such as Germany for instance, different partial databases for the individual federal states could be available, in which, in turn, the towns are entered. Hierarchically arranged under the towns are the administrative districts of the towns and then, under the individual administrative districts of the towns, the street names, etc. In addition, certain geographical criteria, such as relationships between the individual data entries DE, e.g. the proximity of two towns to one another, may also be stored in the database 8. In particular, the database 8 may exhibit an area which records which geographical criteria can be determined from the database without great effort, or for which geographical criteria ready-prepared data records are available. Simultaneously with the prompt output command PB, the dialog manager 3 outputs a list compilation command LB to a word list generator 9, which, according to the geographical criteria currently being sought, retrieves the data entries DE from the geographical database 8 and compiles from these a word list WL, which comprises the active vocabulary for a voice recognition of a subsequent spoken response S by the user. In addition, the dialog manager 3 transmits a start command AS to the voice recognition device 6 and the analysis device 7, which are shown here as one block. The word list generator 9 may be a separate module. However, it may also be a sub-routine of the voice recognition device 6, as shown in Fig. 1 by way of example. The voice recognition device 6 then determines evaluated hypotheses for the spoken response S following the prompt P, wherein the spoken responses are each compared with the stored acoustic models of the words contained in the word list WL compiled by the word list generator 9. Since this is a relatively restricted word list WL, a higher recognition probability is possible than with a complete word list of all geographical proper names. The best-evaluated recognition result EE or multiple recognition hypotheses
EH are then checked again by the analysis device 7, where applicable, for consistency with the data entries DE in the geographical database 8 and/or with previous recognition results and with the preceding prompt. To this end, the analysis device 7 where applicable retrieves consistency check data KCD from the database 8. If a recognition result EE is certain, the database 8 is, where applicable, restricted for the further course of the dialog if, for instance, it can be reliably ruled out, on the basis of the recognition result EE or the hypotheses EH, that certain data entries DE in the database 8 can no longer occur in the case of subsequent spoken responses. So, for example, given a reliable recognition of the words "Lower Saxony" in response to the input request "Please enter the federal state in which your destination is located", all location data in other federal states can be dispensed with for the following steps of the dialog.
The recognition result EE is also reported back to the dialog manager 3 and is there entered in a "slot filling module" 4, which documents the current overall state of knowledge of the system. This slot filling module 4 of the dialog manager 3 decides when the information is sufficient, i.e. when all query points have been clarified, in order that, for example, the destination or the start point can be precisely defined. If the information is not yet sufficient, a further dialog step follows, in which a prompt output command PB is again output to the prompt generator 5, a list compilation command LB to the word list generator 9 and a start signal AS to the voice recognition system 6, in order for the next spoken response to be recognized. In this step, only the previously restricted database 8 is then used, so the entire system can operate considerably faster in the following dialog steps.
If the slot filling module 4 establishes that all necessary information is present, the prompt generator 5 is induced to issue a corresponding prompt confirming the desired destination, and the destination is transmitted to the further components (here once again shown as a block 13) of the navigation system 1 for further processing.
This sequence is described more specifically below with reference to two examples:
In the first example, it is assumed that the dialog sequence is hierarchically structured according to geographical criteria. In this case, a prompt, for example "In what country is your destination?" is output by the prompt generator 5 in a first step. As only the names of different countries are expected as the spoken response to this input request, a word list with the possible countries is generated by the word list generator 9 on the basis of the database 8. This word list is then available during the voice recognition for the subsequent spoken response. This is the first hierarchical level of the dialog. Once the matching country has been correctly recognized, e.g. if the country "Germany" has been stated, the prompt "In which federal state is your destination?" is then generated in a second hierarchical level. A word list is then compiled, listing all the federal states in Germany. Then, in a further hierarchical level, the town is queried or, if applicable, a particular region is queried in an intervening hierarchical step. Once a town has been established, the administrative district may be queried in the case of larger towns, and finally, at one of the lower stages, the street name and a house number, or a particular building, etc.
In the second embodiment example, it is assumed that the individual database entries in the database 8 are provided with markers, which represent particular types of database entries or particular relationships between the database entries. With this variant, the dialog sequence is not strictly hierarchically structured from large down to small geographical units per se, being relatively flexible. A dialog sequence of this kind may, in certain circumstances, i.e. under good recognition conditions, reach the destination in fewer steps than a dialog sequence with a strictly hierarchical structure. In this case, the dialog control unit 3 firstly selects, for example, a prompt "To which town do you want to travel?". Then, if applicable, a word list with all town entries available in the database 8 will be compiled. To the extent that no further restrictions have been undertaken previously, this will, of course, be a relatively long list. In other words, the active vocabulary of the voice recognition system is extremely broad, making voice recognition considerably harder than if the word list had previously been restricted by prior queries as to countries, federal states, etc. If the voice recognition system yields an acceptable recognition result, the town of the destination would have been clarified with just one query and the inputting of the further address data, such as street and house number, could then take place in a subsequent dialog step.
If, however, the system is not sure of the result, e.g. because a calculated confidence level of the various recognition hypotheses is insufficient, this fact may be delivered back to the dialog control device 3 as a (preliminary) recognition result. The latter could then, in a subsequent dialog step, output a further prompt, e.g. "What large conurbation is in the vicinity?". An active word list restricted to large conurbations will then be compiled. This is possible relatively easily in that all data entries DE containing the marker <large conurbation> are searched for from the database 8. This word list is considerably smaller than the word list in the first dialog step, so, owing to the smaller active vocabulary, the recognition result EE will be better in the case of this second query than for the first. Using the recognition result EE, all data entries DE in the database 8 that are located in the vicinity of the large conurbation sought can then be extracted. If applicable, all data entries DE that fulfil the condition of being located near the recognized large conurbation may also be marked in a first step. The new word list is then compiled, containing all towns that fulfil the condition. If the spoken response of the user to the previous query as to the desired town has been stored, it is now possible to undertake a voice recognition for this first spoken response once again with the restricted word list in order to arrive at a better recognition result. Alternatively, the dialog manager 3 may also induce the prompt generation device 5 to output the first prompt "To which town do you want to travel?" once again and then to undertake the voice recognition of the subsequent spoken response with the restricted word list.
In conclusion, it is pointed out once again that the invention is not restricted to the above-described embodiment examples - in particular the precise structure of the voice- data user interface or the precise sequence of the explained dialogs - but may be varied to a large degree by a person skilled in the field without exceeding the bounds of the invention. In particular, it is possible to call upon further criteria, especially supplementary geographical knowledge, for the voice recognition. For example, distances from the current location may also be taken into account in the evaluation of recognition hypotheses and/or the compilation of word lists. The number of times a destination has been visited hitherto by the particular user may also be taken into account, since, in many cases, a user makes frequent journeys to the same locations. It is also possible for multiple queries to be covered by one prompt, e.g. "State the desired destination and a large town in the vicinity". In the subsequent voice recognition, the geographical relationships of various towns can then be used in order that the recognition hypotheses are better evaluated. Furthennore, the slot filling module could, for example, be arranged in the analysis device rather than in the dialog manager.
The invention has been described largely with reference to examples in which a destination has to be determined. In the same way, however, the starting point or other location data, such as intermediate stops etc., may also be determined in a dialog between the user and the system. In other words, multiple similar dialogs may be conducted successively.
For the sake of completeness, it is also pointed out that the use of the indefinite article "a" or "an" does not exclude the possibility that the features in question may also be present more than once, and that the use of the term "comprehensive" does not preclude the existence of further elements or steps.

Claims

CLAIMS:
1. A method of operating a voice-controlled navigation system (1) in which, within an automatically conducted dialog;
- input requests (P) are generated, taking account of geographical criteria (GK), and output to a user, - responses spoken by the user (S) are detected,
- the spoken responses (S) are analyzed for recognition of location data using an automatic voice recognition method, taking account of the geographical criteria (GK).
2. A method as claimed in claim 1, characterized in that, using the geographical criteria (GK) taken into account in the generation of the input request (P), and/or as a function of recognition results (EE) of previous and/or subsequent spoken responses (S) within the dialog by the user, a word list (WL) is compiled from a geographical database 8 to serve as an active vocabulary during the voice recognition of a subsequent spoken response by the user.
3. A method as claimed in claim 1 or 2, characterized in that recognition hypotheses (EH) that have been determined during a voice recognition of a spoken response (S) by the user are evaluated, using a geographical database (8), by means of the geographical criteria (GK) taken into account in the generation of a previous input request (P) and/or as a function of recognition results (EE) of spoken responses (S) by the user previously and/or subsequently within the dialog.
4. A method as claimed in any one of claims 1 to 3, characterized in that input requests (P) hierarchically structured according to geographical criteria are generated within the dialog.
5. A method as claimed in any one of claims 1 to 4, characterized in that the input requests (P) concerning geographical criteria are generated within the dialog as a function of recognition results of previous spoken responses by the user.
6. A method as claimed in any one of claims 1 to 5, characterized in that a geographical database (8) is used within the dialog for compiling a word list (WL) and/or for evaluating a recognition hypothesis, which database has been restricted on the basis of a previous input request (P) and/or a previous recognition result (EE) of a spoken response to certain data entries (DE).
7. A method as claimed in any one of claims 2 to 6, characterized in that, in order to compile a word list (WL) and/or to evaluate a recognition hypothesis (EH), a geographical database (8) is used, with data entries (DE) that each have assigned to them one or more markers (M) representing a type of the data entry (DE) concerned and/or a geographical hierarchy level and/or a relationship to other data entries (DE) and/or other geographical features.
8. A voice-data user interface (2) for a navigation system (1) with:
- an output device (12) for outputting input requests (P) to a user,
- a voice-inputting device (11) for detecting spoken responses (S) by the user,
- a dialog-control device (3) for controlling a dialog with the user, taking account of geographical criteria (GK), - a prompt-generation unit (5) for generating input requests (P), taking account of geographical criteria (GK),
- a voice recognition device (6) and an analysis unit (7) for analyzing the detected spoken responses (S) for the recognition of location data, taking account of geograpliical criteria (GK), - a geographical database (8) and/or a data interface for accessing a geographical database, which makes available geographical criteria (GK) and/or geographical data for the dialog-control device (3) and/or the prompt-generation device (5) and/or the voice recognition device (6) and/or the analysis unit (7).
9. A navigation system (1) with a voice-data user interface (2) as claimed in claim 8.
10. A computer program with program-coding means in order to execute all steps of the method as claimed in any one of claims 1 to 7 when the program is executed on a processor of a voice-data user interface of a navigation system.
11. A method of generating a geographical database (8) for use in a method as claimed in any one of claims 1 to 7, in which the individual data entries (DE) each have assigned to them one or more markers (M) representing a type and/or a relationship to other data entries (DE) and/or a geographical hierarchy level and/or other geographical features of the data entry (DE) concerned.
PCT/IB2004/050706 2003-05-26 2004-05-14 Method of operating a voice-controlled navigation system WO2004104520A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP04733066A EP1631791A1 (en) 2003-05-26 2004-05-14 Method of operating a voice-controlled navigation system
JP2006530859A JP2007505365A (en) 2003-05-26 2004-05-14 Voice control navigation system operation method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP03101523.3 2003-05-26
EP03101523 2003-05-26

Publications (1)

Publication Number Publication Date
WO2004104520A1 true WO2004104520A1 (en) 2004-12-02

Family

ID=33462217

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2004/050706 WO2004104520A1 (en) 2003-05-26 2004-05-14 Method of operating a voice-controlled navigation system

Country Status (4)

Country Link
EP (1) EP1631791A1 (en)
JP (1) JP2007505365A (en)
CN (1) CN1795367A (en)
WO (1) WO2004104520A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1860918A1 (en) * 2006-05-23 2007-11-28 Harman/Becker Automotive Systems GmbH Communication system and method for controlling the output of an audio signal
GB2440766A (en) * 2006-08-10 2008-02-13 Denso Corp Voice recognition controlled system for providing a disclaimer to be acknowledged before allowing operation of a vehicle navigation system
EP2003641A2 (en) * 2006-03-31 2008-12-17 Pioneer Corporation Voice input support device, method thereof, program thereof, recording medium containing the program, and navigation device
US20100235091A1 (en) * 2009-03-13 2010-09-16 Qualcomm Incorporated Human assisted techniques for providing local maps and location-specific annotated data
US8938211B2 (en) 2008-12-22 2015-01-20 Qualcomm Incorporated Providing and utilizing maps in location determination based on RSSI and RTT data
US9080882B2 (en) 2012-03-02 2015-07-14 Qualcomm Incorporated Visual OCR for positioning
WO2016133658A1 (en) * 2015-02-16 2016-08-25 Jaybridge Robotics, Inc. Assistive vehicular guidance system and method
US9500492B2 (en) 2014-03-03 2016-11-22 Apple Inc. Map application with improved navigation tools
US10113879B2 (en) 2014-03-03 2018-10-30 Apple Inc. Hierarchy of tools for navigation
CN113364920A (en) * 2021-06-09 2021-09-07 中国银行股份有限公司 Incoming line request processing method and device and electronic equipment

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1939860B1 (en) * 2006-11-30 2009-03-18 Harman Becker Automotive Systems GmbH Interactive speech recognition system
CN105302082A (en) * 2014-06-08 2016-02-03 上海能感物联网有限公司 Controller apparatus for on-site automatic navigation and car driving by non-specific person foreign language speech
CN105302079A (en) * 2014-06-08 2016-02-03 上海能感物联网有限公司 Controller apparatus for controlling on-site car driving by Chinese speech
JP6250121B1 (en) * 2016-09-16 2017-12-20 ヤフー株式会社 Map search apparatus, map search method, and map search program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6230132B1 (en) * 1997-03-10 2001-05-08 Daimlerchrysler Ag Process and apparatus for real-time verbal input of a target address of a target address system
DE19962048A1 (en) * 1999-12-22 2001-07-12 Detlef Zuendorf Voice controlled target address recognition for route guidance system for vehicle, involves entering target location using voice and outputting target by voice for verification
EP1233407A1 (en) * 2001-02-15 2002-08-21 Navigation Technologies Corporation Spatially built word list for automatic speech recognition program and method for formation thereof
EP1298415A2 (en) * 2001-09-27 2003-04-02 Robert Bosch Gmbh Navigation system with speech recognition

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6230132B1 (en) * 1997-03-10 2001-05-08 Daimlerchrysler Ag Process and apparatus for real-time verbal input of a target address of a target address system
DE19962048A1 (en) * 1999-12-22 2001-07-12 Detlef Zuendorf Voice controlled target address recognition for route guidance system for vehicle, involves entering target location using voice and outputting target by voice for verification
EP1233407A1 (en) * 2001-02-15 2002-08-21 Navigation Technologies Corporation Spatially built word list for automatic speech recognition program and method for formation thereof
EP1298415A2 (en) * 2001-09-27 2003-04-02 Robert Bosch Gmbh Navigation system with speech recognition

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2003641A2 (en) * 2006-03-31 2008-12-17 Pioneer Corporation Voice input support device, method thereof, program thereof, recording medium containing the program, and navigation device
EP2003641A4 (en) * 2006-03-31 2012-01-04 Pioneer Corp Voice input support device, method thereof, program thereof, recording medium containing the program, and navigation device
EP1860918A1 (en) * 2006-05-23 2007-11-28 Harman/Becker Automotive Systems GmbH Communication system and method for controlling the output of an audio signal
US8019454B2 (en) 2006-05-23 2011-09-13 Harman Becker Automotive Systems Gmbh Audio processing system
GB2440766A (en) * 2006-08-10 2008-02-13 Denso Corp Voice recognition controlled system for providing a disclaimer to be acknowledged before allowing operation of a vehicle navigation system
US7881940B2 (en) 2006-08-10 2011-02-01 Denso Corporation Control system
GB2440766B (en) * 2006-08-10 2011-02-16 Denso Corp Control system
US8938211B2 (en) 2008-12-22 2015-01-20 Qualcomm Incorporated Providing and utilizing maps in location determination based on RSSI and RTT data
US20100235091A1 (en) * 2009-03-13 2010-09-16 Qualcomm Incorporated Human assisted techniques for providing local maps and location-specific annotated data
US8938355B2 (en) * 2009-03-13 2015-01-20 Qualcomm Incorporated Human assisted techniques for providing local maps and location-specific annotated data
US9080882B2 (en) 2012-03-02 2015-07-14 Qualcomm Incorporated Visual OCR for positioning
US9500492B2 (en) 2014-03-03 2016-11-22 Apple Inc. Map application with improved navigation tools
US10113879B2 (en) 2014-03-03 2018-10-30 Apple Inc. Hierarchy of tools for navigation
US10161761B2 (en) 2014-03-03 2018-12-25 Apple Inc. Map application with improved search tools
US11035688B2 (en) 2014-03-03 2021-06-15 Apple Inc. Map application with improved search tools
US11181388B2 (en) 2014-03-03 2021-11-23 Apple Inc. Hierarchy of tools for navigation
WO2016133658A1 (en) * 2015-02-16 2016-08-25 Jaybridge Robotics, Inc. Assistive vehicular guidance system and method
US9464913B2 (en) 2015-02-16 2016-10-11 Jaybridge Robotics, Inc. Assistive vehicular guidance system and method
CN113364920A (en) * 2021-06-09 2021-09-07 中国银行股份有限公司 Incoming line request processing method and device and electronic equipment
CN113364920B (en) * 2021-06-09 2023-01-20 中国银行股份有限公司 Incoming line request processing method and device and electronic equipment

Also Published As

Publication number Publication date
JP2007505365A (en) 2007-03-08
CN1795367A (en) 2006-06-28
EP1631791A1 (en) 2006-03-08

Similar Documents

Publication Publication Date Title
US6411893B2 (en) Method for selecting a locality name in a navigation system by voice input
US6598018B1 (en) Method for natural dialog interface to car devices
EP1233407B1 (en) Speech recognition with spatially built word list
US7184957B2 (en) Multiple pass speech recognition method and system
US7328155B2 (en) Method and system for speech recognition using grammar weighted based upon location information
EP2226793B1 (en) Speech recognition system and data updating method
US8996385B2 (en) Conversation system and conversation software
EP1050872A2 (en) Method and system for selecting recognized words when correcting recognized speech
US20080177541A1 (en) Voice recognition device, voice recognition method, and voice recognition program
US20080059199A1 (en) In-vehicle apparatus
EP1631791A1 (en) Method of operating a voice-controlled navigation system
US7209884B2 (en) Speech input into a destination guiding system
CN108871370A (en) Air navigation aid, device, equipment and medium
US20120253822A1 (en) Systems and Methods for Managing Prompts for a Connected Vehicle
GB2422011A (en) Vehicle navigation system and method using speech
KR100770644B1 (en) Method and system for an efficient operating environment in a real-time navigation system
JP2001022779A (en) Interactive information retrieval device, method for interactive information retrieval using computer, and computer-readable medium where program performing interactive information retrieval is recorded
JPH0764480A (en) Voice recognition device for on-vehicle processing information
US20090210144A1 (en) Method for selecting a destination
JP3645104B2 (en) Dictionary search apparatus and recording medium storing dictionary search program
Bernsen On-line user modelling in a mobile spoken dialogue system.
WO2006028171A1 (en) Data presentation device, data presentation method, data presentation program, and recording medium containing the program
KR200328847Y1 (en) Geographical information provider which gives user&#39;s previously input schedule together
JP4822993B2 (en) Point search device and navigation device
KR100465827B1 (en) Geographical information provider which gives user&#39;s previously input schedule together

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2004733066

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2006530859

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 20048143866

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 2004733066

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 2004733066

Country of ref document: EP