US20170039124A1 - Method and apparatus for interception of synchronization objects in graphics application programming interfaces for frame debugging - Google Patents

Method and apparatus for interception of synchronization objects in graphics application programming interfaces for frame debugging Download PDF

Info

Publication number
US20170039124A1
US20170039124A1 US14/845,123 US201514845123A US2017039124A1 US 20170039124 A1 US20170039124 A1 US 20170039124A1 US 201514845123 A US201514845123 A US 201514845123A US 2017039124 A1 US2017039124 A1 US 2017039124A1
Authority
US
United States
Prior art keywords
synchronization object
signaling
waiting
tasks
application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US14/845,123
Other versions
US9910760B2 (en
Inventor
Jeffrey Kiel
Dan Price
Mike STRAUSS
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nvidia Corp
Original Assignee
Nvidia Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nvidia Corp filed Critical Nvidia Corp
Priority to US14/845,123 priority Critical patent/US9910760B2/en
Assigned to NVIDIA CORPORATION reassignment NVIDIA CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KIEL, JEFFREY, PRICE, DAN, STRAUSS, MIKE
Publication of US20170039124A1 publication Critical patent/US20170039124A1/en
Application granted granted Critical
Publication of US9910760B2 publication Critical patent/US9910760B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3636Software debugging by tracing the execution of the program
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/526Mutual exclusion algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3664Environments for testing or debugging software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites

Definitions

  • Debugging is a well-known process for finding the causes of undesirable operations in computer applications and modules.
  • the undesirable operations may include, but are not limited to, unexpected behavior such as extended delays (“freezing”), unintended repetition (“looping”), unintended termination (“crashing”), or problems in the storage and/or manipulation of data, such as data discrepancies, memory faults, or anomalies.
  • the undesirable operations are caused by errors (“bugs”) in the application or module software.
  • a frame debugger is a tool that allows users to inspect state/data at various points in a set of graphics frames with the intent of uncovering application bugs that produce incorrect rendering or other unintended behavior. Such bugs may be a result of program errors such as improperly configured state, incorrect operations sent to the GPU, corrupt data, or data hazards (often by consuming data before it has been produced).
  • a frame debugger may capture (record) and replay the graphics operations generated by an application to enable such inspection.
  • APIs 3D application programming interfaces
  • runtimes and drivers that implement such APIs manage the complexity of potential data hazards internally, freeing the application developer from the need to worry about such complexity.
  • a more recent industry practice has shifted the burden of resource management, data hazard management, and operation synchronization across processors to the application. This is done via APIs designed to expose such functionality.
  • a conventional mechanism for ordering or synchronizing operations with data dependencies across two or more processors is to use synchronization objects or primitives.
  • Such objects allow one processor to communicate with one or more other processors when a workload (set of tasks or operations) has completed.
  • a fence object is an example of such a synchronization primitive.
  • a processor can wait on a fence object, effectively blocking the processor from continuing any work, until the fence is signaled by another processor.
  • a fence typically encapsulates a value that can be observed by processors, allowing the processors or application to make decisions about what workloads to execute based on the current progress made by other processors as indicated by the fence value.
  • a set of bugs arising from incorrect fence usage includes, but is not limited to, data being consumed before it has been produced (no fence used or fence improperly used), less than optimal utilization of processors as a result of unnecessary fence waiting, processor hangs, and application or other system crashes.
  • a graphics frame debugger that does not properly detect and replicate an application's use of fences will, at a minimum, have trouble replaying the application's sequence of events in a consistent and well-ordered way. Additionally, it will not be able to provide feedback to users about potential erroneous fence usage without accurately tracking fence operations.
  • An aspect of the present invention proposes a system for correctly intercepting fence operations and detecting the order in which tasks (specified via functions and methods exposed by the API) are executed on one or more processors.
  • a frame debugging interception layer As two separate fence objects. These objects are, in turn, implemented by the underlying graphics API.
  • One of these fence objects is known to the frame debugging interception layer as a signaling fence, while the other is known as the waiting fence.
  • Application operations that signal the fence end up operating on the underlying signaling fence while application operations that observe or wait on the fence end up operating on the waiting fence.
  • the interception layer is responsible for detecting completion of work as indicated by the signaling fence, and propagating this information to the waiting fence.
  • the system may be extended to provide capabilities for capturing and replaying of tasks for purposes such as frame debugging and the like.
  • a second pair of synchronization objects is used to accomplish this task.
  • artificial function bundles structures for tracking which functions or methods an application has called to issue graphics work
  • These function bundles represent the point at which the interception layer is first made aware that the signal has completed.
  • the function bundles may, for example, instruct the replay system to wait for such synchronization to complete, as function bundles captured after this point may have been ordered according to the synchronization operation. At the beginning and end of each captured frame all unblocked work submitted via the graphics API is forced to complete. This ensures that all signals land as intended.
  • embodiments of the present invention include a method for performing application-based synchronization between two or more processors, in which a plurality of processing tasks are assigned to and performed in a plurality of processors.
  • the method suspends, via usage of a waited synchronization object, a performance of a subsequent plurality of processing tasks until a separate signaling synchronization object is signaled as being completed, and the signal is propagated by an interception layer to the waiting synchronization object.
  • the pair of synchronization objects are created by an interception layer, but appear as a single synchronization object to the application.
  • a method for performing application-based frame-debugging in which two pairs of synchronization objects are used, with the first pair of synchronization objects being used to intercept, capture, and record signals before propagating the signals to the second (interior) pair of synchronization objects, which are used to perform the wait, propagate, and signal functionality described above.
  • Yet another embodiment includes a system for performing the methods described above that includes a memory device and a plurality of processors, collectively executing the application, drivers of at least one of the plurality of processors, and an interception layer that performs application-based synchronization and frame-debugging.
  • FIG. 1 is a diagram that depicts an exemplary stack configuration for application data flow, in accordance with various aspects of the present invention.
  • FIG. 2 depicts a flowchart of an exemplary computer-controlled process for performing application-based synchronization between two or more processors with synchronization objects, in accordance with various embodiments of the present invention.
  • FIG. 3 depicts a flowchart of an exemplary computer-controlled process for performing debugging using paired synchronization objects, in accordance with various aspects of the present invention.
  • FIG. 4 is a diagram that depicts an exemplary computing system, in accordance with embodiments of the present invention.
  • Embodiments of the claimed subject matter are presented to provide a novel system and method for intercepting synchronization operations, such as those that are performed using a fence primitive, and detecting the order in which tasks are executed on one or more processors.
  • a processor may be physical, logical, or virtual, a process, thread, or work queue, a CPU or GPU, or other such computer system capable of executing work. Additional aspects of the claimed subject matter may be extended to provide capabilities for capturing and replaying such tasks for the purpose of frame debugging and the like.
  • FIG. 1 is a diagram that depicts an exemplary configuration of a frame debugger interception stack, in accordance with various aspects of the present invention.
  • an application 101 (executed by, for example, a processor in a computing system) generates and issues graphics commands via functions and methods during operation.
  • a runtime and/or driver that implements a graphics API ( 105 ) receives such commands, and sends them to the GPU ( 107 ).
  • Data can flow from the application level to the GPU and from the GPU to the application; as indicated by the bidirectional dataflow.
  • a system containing an interception layer for frame debugging includes an interception layer ( 103 ). This layer intercepts commands specified by the application.
  • the interception layer can, among other things, shadow state changes made by the commands, record the commands, forward the commands on to the runtime and/or driver ( 105 ), forward modified commands to the runtime and/or driver ( 105 ), and issue additional commands to the runtime and/or driver ( 105 ).
  • two or more processors may use a fence or other synchronization primitive to order work as depicted in the process 200 of FIG. 2 .
  • processor 0 performs some work ( 201 ) before signaling a fence ( 207 ) to indicate that the work has been completed. After signaling the fence, processor 0 continues to perform more work ( 211 ).
  • Processor 1 has a workload ( 203 ) that can be assumed to be independent of any work being done by processor 0 based on the usage of the fence. Processor 1 may execute this workload at any time before, during, or after processor 0 executes ( 201 , 207 , or 211 ). Processor 1 then waits on the fence ( 205 ).
  • processor 1 This blocks processor 1 from doing more work until after processor 0 signals the fence ( 207 ), and the signal is made visible to processor 1 ( 209 ). Once the signal is visible to processor 1 , processor 1 can perform additional work ( 213 ) that is, based on the usage of the fence, likely dependent on work performed in ( 201 ). The exact timing and nature of how the signal is made visible ( 209 ) is typically opaque. This can present problems for a frame debugger interception layer that needs to know the exact timing and ordering of events that are executed on one or more processors.
  • a frame debugger interception layer may operate in different modes.
  • one mode is known as “running” mode.
  • running mode the application runs normally, although with all commands being passed through the interception layer.
  • the interception layer may make minor modifications to commands for compatibility or tracking reasons, or to enable the interception layer to expose real-time information to the user.
  • a pair of modes known as “capture” mode and “replay” mode implement frame debugging functionality.
  • Frame debugging allows a user to capture one or more frames of graphics commands, and then replay them in a loop. This allows the user to inspect individual graphics commands, and to observe and verify their output with the intent of uncovering the source of application program errors.
  • capturing graphics commands may be performed by using function bundles.
  • Each function bundle may represent the tokenization or unitization of a function or method call to the 3D graphics API.
  • Such tokenization includes an ID (e.g., a value) that indicates which function or method the command corresponds to, and the parameters used by the function.
  • ID e.g., a value
  • a function bundle is recorded each time a function or method is called by the application.
  • a frame debugger interception layer may respond to an application request to generate (create) a single synchronization object with signaling and waiting capabilities, such as a fence, by creating two fences internal to the interception layer. These fences are used to implement the application's notion of a fence object in running mode. One fence is known as the “signaling” fence and the other is known as the “waiting” fence. This detail is opaque to the application, which sees a single fence as if the interception layer was not in place.
  • the interception layer applies it to the signaling fence.
  • the interception layer applies it to the waiting fence.
  • the interception layer sees a signal operation, the interception layer uses available mechanisms from the API to monitor or listen for the fence to complete to the specified value.
  • the signaling fence may have a value that corresponds to the state of progress of a particular processor working on a set of tasks or operations.
  • the waiting fence likewise has a value that corresponds to the state of progress as indicated by the signaling fence and as processed by the interception layer.
  • the current state (value) of the application's notion of a fence is based on the interception layer's waiting fence.
  • the current state or value of the application's notion of the fence may include a different value or state that corresponds to the application's notion of the already submitted or assigned tasks to be performed.
  • the interception layer knows when the signaling fence has completed (reached a certain value).
  • the interception layer may do additional work such as data or task verification, logging, consistency checks, or any other similar tasks for the purposes of data analysis and/or frame debugging. Following such operations, the interception layer forwards the signal on to the waiting fence, which allows the application to proceed. Processors waiting on the fence are unblocked.
  • FIG. 3 depicts an alternate approach and describes a process 300 for synchronization object processing.
  • FIG. 3 is similar to FIG. 2 , however step 209 has been replaced by steps 309 , 311 , 313 , and 319 .
  • the signal operation ( 307 ) executed by processor 0 happens on the interception layer's signaling fence.
  • the interception layer monitors this fence and receives the signal ( 309 ).
  • the interception layer may perform necessary or desired updates ( 311 ), and the signal is propagated to the waiting fence ( 313 ).
  • the signal on the waiting fence ( 319 ) is received by processor 1 , unblocking it. Processor 1 is then free to continue executing other work ( 317 ).
  • the frame debugging process uses a second pair of fence objects.
  • the pair of fence objects in use while the interception layer is in running mode may be implemented by the underlying runtime/driver in such a way that “replaying” a signal value (i.e. signaling the fence with a previously used value) may lead to incorrect behavior.
  • an application's use of a fence may be incompatible with replaying a signal value.
  • the application may be designed to generate new work when a signal of a given value is received or observed by a processor. The work may only be intended to be generated once.
  • the capture/replay process uses a second pair of fence objects to avoid such incompatibilities.
  • This system provides an interception layer and frame debugger to correctly track the fence usage of an application.
  • the interception layer when the user indicates that the interception layer should enter frame debugging (capture/replay) mode, the interception layer will internally redirect all application fence operations from the running mode signal/wait fence pair to the frame debugging pair. This may require bootstrapping the frame debugging pair by artificially signaling the fences to particular values that reflect the application's current progress.
  • the interception layer redirects all application fence operations to the original (running mode) pair of fences until the next mode change. The user can transition from running mode to frame debugging mode and back as many times as is desired.
  • Correct replay of the application's commands as recorded in function bundles may be dependent on detecting when the application has made a decision by observing the value of a fence object.
  • knowing the order of application specified commands relative to the time that a fence signal completes during capture mode allows the interception layer to maintain this ordering in replay mode.
  • this order is maintained during replay mode by inserting an artificial function bundle into the stream of function bundles at the time the interception layer receives a signal from the signaling fence during capture mode. This is done before propagating the signal to the waiting fence so that any work dependent on the signal will be captured after the artificial function bundle has been captured.
  • this application specified behavior will be processed as intended with a two fence implementation in the interception layer. Additionally, when capturing one or more frames of operations, a frame debugger interception layer will be able to correctly capture the order and timing of 1) the application signaling a fence, 2) the associated processor completing the work and the fence signaling or updating its value, 3) application operations that monitor or observe the value of the fence, and 4) application operations that request that a processor wait on a fence. Additionally, depending on the API, the interception layer will be able to properly record the order of operations triggered via callbacks associated with the signaling of a fence.
  • the captured application specified behavior can be replayed while maintaining the same order of operations.
  • the interception layer knows the order of signal, monitor, and wait operations, in addition to knowing when the fence has actually been signaled. Knowing that the fence has been signaled is possible because the interception layer is always the first layer of software above the driver stack that is aware that a fence signal has completed. The interception layer notifies other layers via propagation of the signal to the waiting fence.
  • additional information collected during the frame capture and replay process may be used to detect improper fence usage.
  • Knowledge of resource production and consumption by particular processors allows the frame debugger interception layer to know when synchronization must occur in order to produce correct results. Since the interception layer knows all the details about the application intended synchronization operations, it can determine if there are missing synchronization operations. For example, operations that the application should issue in order to be correct, but that the application is not currently issuing. Such a condition would be an application bug that the frame debugger interception layer is able to report to the user. In the absence of such an automatic detection mechanism, basic display of fence operations and resource operations can inform a user about improper fence usage. Additionally the frame debugger interception layer may detect situations where a fence is used unnecessarily.
  • an exemplary computer system 400 upon which embodiments of the present invention may be implemented (such as the processes 200 and 300 described above) includes a general-purpose computing system environment.
  • computing system 400 typically includes at least one processing unit 401 and memory, and an address/data bus 409 (or other interface) for communicating information.
  • memory may be volatile (such as RAM 402 ), non-volatile (such as ROM 403 , flash memory, etc.) or some combination of the two.
  • Computer system 400 may also comprise an optional graphics subsystem 405 for presenting information to the computer user, e.g., by displaying information on an attached display device 410 .
  • the processing of one or more tasks (e.g., commands and instructions) of an application executing in computer system 400 may be performed, in whole or in part, by graphics subsystem 405 in conjunction with the processor 401 and memory 402 .
  • a first portion of a plurality of tasks may be assigned by the application to the processor 401 , with a second portion of the plurality of tasks being dependent on one or more tasks of the first portion of tasks, and being assigned to be performed by the graphics subsystem 405 .
  • the first and second portions are assigned to two or more processors 401 , two or more graphics subsystems 405 , or any combination thereof.
  • computing system 400 may also have additional features/functionality.
  • computing system 400 may also include additional storage (removable and/or non-removable) including, b t not limited to, magnetic or optical disks or tape.
  • additional storage is illustrated in FIG. 4 by data storage device 407 .
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • RAM 402 , ROM 403 , and data storage device 407 are all examples of computer storage media.
  • Computer system 400 also comprises an optional alphanumeric input device 406 , an optional cursor control or directing device 407 , and one or more signal communication interfaces (input/output devices, e.g., a network interface card) 409 .
  • Optional alphanumeric input device 406 can communicate information and command selections to central processor 401 .
  • Optional cursor control or directing device 407 is coupled to bus 409 for communicating user input information and command selections to central processor 401 .
  • Signal communication interface (input/output device) 409 also coupled to bus 409 , can be a serial port.
  • Communication interface 409 may also include wireless communication mechanisms.
  • computer system 400 can be communicatively coupled to other computer systems over a communication network such as the Internet or an intranet (e.g., a local area network), or can receive data (e.g., a digital television signal).
  • Embodiments described herein provide a new approach for performing synchronization of application processing tasks and for performing debugging and data analysis of discretized and tokenized units or function bundles produced during the execution of the processing tasks. Advantages of the invention described herein provide for more efficient parallel processing while still maintaining sequential order and avoiding data hazards by using separate, non-blocking fence primitives.

Abstract

An aspect of the present invention proposes a solution for correctly intercepting, capturing, and replaying tasks (such as functions and methods) in an interception layer operating between an application programming interface (API) and the driver of a processor by using synchronization objects such as fences. According to one or more embodiments of the present invention, the application will use what appears to the application to be a single synchronization object to signal (from a processor) and to wait (on a processor), but will actually be two separate synchronization objects in the interception layer. According to one or more embodiments, the solution proposed herein may be implemented as part of an module or tool that works as an interception layer between an application and an API exposed by a device driver of a resource, and allows for an efficient and effective approach to frame-debugging and live capture and replay of function bundles.

Description

    CLAIM OF PRIORITY
  • This application claims the benefit of U.S. provisional patent application No. 62/202,743 filed Aug. 7, 2015 to Kiel et al., and which is incorporated herein in its entirety by reference.
  • BACKGROUND OF THE INVENTION
  • Debugging is a well-known process for finding the causes of undesirable operations in computer applications and modules. The undesirable operations may include, but are not limited to, unexpected behavior such as extended delays (“freezing”), unintended repetition (“looping”), unintended termination (“crashing”), or problems in the storage and/or manipulation of data, such as data discrepancies, memory faults, or anomalies. Typically the undesirable operations are caused by errors (“bugs”) in the application or module software.
  • In the case of computer graphics applications, the process of debugging may be made more complex by the use of heterogeneous computing systems that include both CPUs and GPUs. Additionally, debugging may be complicated by asynchronous processing on such systems, large datasets, and the need to have visibility into the complex state machines implemented by one or more GPUs. A frame debugger is a tool that allows users to inspect state/data at various points in a set of graphics frames with the intent of uncovering application bugs that produce incorrect rendering or other unintended behavior. Such bugs may be a result of program errors such as improperly configured state, incorrect operations sent to the GPU, corrupt data, or data hazards (often by consuming data before it has been produced). A frame debugger may capture (record) and replay the graphics operations generated by an application to enable such inspection.
  • The functionality provided by one or more GPUs or graphics systems is exposed using 3D application programming interfaces (APIs). Traditionally the runtimes and drivers that implement such APIs manage the complexity of potential data hazards internally, freeing the application developer from the need to worry about such complexity. A more recent industry practice has shifted the burden of resource management, data hazard management, and operation synchronization across processors to the application. This is done via APIs designed to expose such functionality.
  • A conventional mechanism for ordering or synchronizing operations with data dependencies across two or more processors (homogenous, heterogeneous, physical, logical, virtual, etc.) is to use synchronization objects or primitives. Such objects allow one processor to communicate with one or more other processors when a workload (set of tasks or operations) has completed. A fence object is an example of such a synchronization primitive. A processor can wait on a fence object, effectively blocking the processor from continuing any work, until the fence is signaled by another processor. A fence typically encapsulates a value that can be observed by processors, allowing the processors or application to make decisions about what workloads to execute based on the current progress made by other processors as indicated by the fence value. These kinds of synchronization primitives are exposed by modern 3D graphics APIs to aid in synchronizing work across CPUs and GPU engines.
  • Correct programming in a multi-processor environment is inherently complex. A set of bugs arising from incorrect fence usage includes, but is not limited to, data being consumed before it has been produced (no fence used or fence improperly used), less than optimal utilization of processors as a result of unnecessary fence waiting, processor hangs, and application or other system crashes. A graphics frame debugger that does not properly detect and replicate an application's use of fences will, at a minimum, have trouble replaying the application's sequence of events in a consistent and well-ordered way. Additionally, it will not be able to provide feedback to users about potential erroneous fence usage without accurately tracking fence operations.
  • SUMMARY OF THE INVENTION
  • This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
  • An aspect of the present invention proposes a system for correctly intercepting fence operations and detecting the order in which tasks (specified via functions and methods exposed by the API) are executed on one or more processors. According to one or more embodiments of the present invention, what appears to be a single fence to the application is implemented by a frame debugging interception layer as two separate fence objects. These objects are, in turn, implemented by the underlying graphics API. One of these fence objects is known to the frame debugging interception layer as a signaling fence, while the other is known as the waiting fence. Application operations that signal the fence end up operating on the underlying signaling fence, while application operations that observe or wait on the fence end up operating on the waiting fence. The interception layer is responsible for detecting completion of work as indicated by the signaling fence, and propagating this information to the waiting fence.
  • According to another aspect of the present invention, the system may be extended to provide capabilities for capturing and replaying of tasks for purposes such as frame debugging and the like. For embodiments to perform frame capture and replay, a second pair of synchronization objects is used to accomplish this task. In order to ensure that frame replay takes place in such a way that it honors the time at which a synchronization mechanism lands or completes, artificial function bundles (structures for tracking which functions or methods an application has called to issue graphics work) are inserted into the stream of captured function bundles. These function bundles represent the point at which the interception layer is first made aware that the signal has completed. The function bundles may, for example, instruct the replay system to wait for such synchronization to complete, as function bundles captured after this point may have been ordered according to the synchronization operation. At the beginning and end of each captured frame all unblocked work submitted via the graphics API is forced to complete. This ensures that all signals land as intended.
  • More specifically, embodiments of the present invention include a method for performing application-based synchronization between two or more processors, in which a plurality of processing tasks are assigned to and performed in a plurality of processors. The method suspends, via usage of a waited synchronization object, a performance of a subsequent plurality of processing tasks until a separate signaling synchronization object is signaled as being completed, and the signal is propagated by an interception layer to the waiting synchronization object. According to such an embodiment, the pair of synchronization objects are created by an interception layer, but appear as a single synchronization object to the application.
  • According to a second embodiment, a method for performing application-based frame-debugging is also provided, in which two pairs of synchronization objects are used, with the first pair of synchronization objects being used to intercept, capture, and record signals before propagating the signals to the second (interior) pair of synchronization objects, which are used to perform the wait, propagate, and signal functionality described above.
  • Yet another embodiment includes a system for performing the methods described above that includes a memory device and a plurality of processors, collectively executing the application, drivers of at least one of the plurality of processors, and an interception layer that performs application-based synchronization and frame-debugging.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The accompanying drawings are incorporated in and form a part of this specification. The drawings illustrate embodiments. Together with the description, the drawings serve to explain the principles of the embodiments:
  • FIG. 1 is a diagram that depicts an exemplary stack configuration for application data flow, in accordance with various aspects of the present invention.
  • FIG. 2 depicts a flowchart of an exemplary computer-controlled process for performing application-based synchronization between two or more processors with synchronization objects, in accordance with various embodiments of the present invention.
  • FIG. 3 depicts a flowchart of an exemplary computer-controlled process for performing debugging using paired synchronization objects, in accordance with various aspects of the present invention.
  • FIG. 4 is a diagram that depicts an exemplary computing system, in accordance with embodiments of the present invention.
  • DETAILED DESCRIPTION
  • Reference will now be made in detail to the preferred embodiments of the claimed subject matter, a method and system for the use of a computing system, examples of which are illustrated in the accompanying drawings. While the claimed subject matter will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit these embodiments. On the contrary, the claimed subject matter is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope as defined by the appended claims.
  • Furthermore, in the following detailed descriptions of embodiments of the claimed subject matter, numerous specific details are set forth in order to provide a thorough understanding of the claimed subject matter. However, it will be recognized by one of ordinary skill in the art that the claimed subject matter may be practiced without these specific details. In other instances, well known methods, procedures, components, and circuits have not been described in detail as not to obscure unnecessarily aspects of the claimed subject matter.
  • Some portions of the detailed descriptions which follow are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits that can be performed on computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer generated step, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
  • It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present claimed subject matter, discussions utilizing terms such as “storing,” “creating,” “protecting,” “receiving,” “encrypting,” “decrypting,” “destroying,” or the like, refer to the action and processes of a computer system or integrated circuit, or similar electronic computing device, including an embedded system, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
  • Synchronization Objects
  • Embodiments of the claimed subject matter are presented to provide a novel system and method for intercepting synchronization operations, such as those that are performed using a fence primitive, and detecting the order in which tasks are executed on one or more processors. Here a processor may be physical, logical, or virtual, a process, thread, or work queue, a CPU or GPU, or other such computer system capable of executing work. Additional aspects of the claimed subject matter may be extended to provide capabilities for capturing and replaying such tasks for the purpose of frame debugging and the like.
  • FIG. 1 is a diagram that depicts an exemplary configuration of a frame debugger interception stack, in accordance with various aspects of the present invention. As depicted in FIG. 1, an application 101 (executed by, for example, a processor in a computing system) generates and issues graphics commands via functions and methods during operation. In a conventional stack, a runtime and/or driver that implements a graphics API (105) receives such commands, and sends them to the GPU (107). Data can flow from the application level to the GPU and from the GPU to the application; as indicated by the bidirectional dataflow.
  • According to one or more embodiments, a system containing an interception layer for frame debugging includes an interception layer (103). This layer intercepts commands specified by the application. The interception layer can, among other things, shadow state changes made by the commands, record the commands, forward the commands on to the runtime and/or driver (105), forward modified commands to the runtime and/or driver (105), and issue additional commands to the runtime and/or driver (105).
  • In a non-intercepted system, two or more processors may use a fence or other synchronization primitive to order work as depicted in the process 200 of FIG. 2. In this diagram, processor 0 performs some work (201) before signaling a fence (207) to indicate that the work has been completed. After signaling the fence, processor 0 continues to perform more work (211). Processor 1 has a workload (203) that can be assumed to be independent of any work being done by processor 0 based on the usage of the fence. Processor 1 may execute this workload at any time before, during, or after processor 0 executes (201, 207, or 211). Processor 1 then waits on the fence (205). This blocks processor 1 from doing more work until after processor 0 signals the fence (207), and the signal is made visible to processor 1 (209). Once the signal is visible to processor 1, processor 1 can perform additional work (213) that is, based on the usage of the fence, likely dependent on work performed in (201). The exact timing and nature of how the signal is made visible (209) is typically opaque. This can present problems for a frame debugger interception layer that needs to know the exact timing and ordering of events that are executed on one or more processors.
  • According to one or more embodiments, a frame debugger interception layer may operate in different modes. In one such embodiment, one mode is known as “running” mode. In running mode the application runs normally, although with all commands being passed through the interception layer. The interception layer may make minor modifications to commands for compatibility or tracking reasons, or to enable the interception layer to expose real-time information to the user. In one such embodiment, a pair of modes known as “capture” mode and “replay” mode implement frame debugging functionality. Frame debugging allows a user to capture one or more frames of graphics commands, and then replay them in a loop. This allows the user to inspect individual graphics commands, and to observe and verify their output with the intent of uncovering the source of application program errors.
  • In one or more embodiments, capturing graphics commands may be performed by using function bundles. Each function bundle may represent the tokenization or unitization of a function or method call to the 3D graphics API. Such tokenization includes an ID (e.g., a value) that indicates which function or method the command corresponds to, and the parameters used by the function. During capture mode, a function bundle is recorded each time a function or method is called by the application.
  • According to one or more embodiments, a frame debugger interception layer may respond to an application request to generate (create) a single synchronization object with signaling and waiting capabilities, such as a fence, by creating two fences internal to the interception layer. These fences are used to implement the application's notion of a fence object in running mode. One fence is known as the “signaling” fence and the other is known as the “waiting” fence. This detail is opaque to the application, which sees a single fence as if the interception layer was not in place. When the application issues a command that would signal a fence, the interception layer applies it to the signaling fence. When the application issues a command to monitor or wait on a fence, the interception layer applies it to the waiting fence. When the interception layer sees a signal operation, the interception layer uses available mechanisms from the API to monitor or listen for the fence to complete to the specified value.
  • According to such embodiments, the signaling fence may have a value that corresponds to the state of progress of a particular processor working on a set of tasks or operations. The waiting fence likewise has a value that corresponds to the state of progress as indicated by the signaling fence and as processed by the interception layer. In one or more embodiments the current state (value) of the application's notion of a fence is based on the interception layer's waiting fence. The current state or value of the application's notion of the fence may include a different value or state that corresponds to the application's notion of the already submitted or assigned tasks to be performed. In such embodiments, the interception layer knows when the signaling fence has completed (reached a certain value). When this happens, the interception layer may do additional work such as data or task verification, logging, consistency checks, or any other similar tasks for the purposes of data analysis and/or frame debugging. Following such operations, the interception layer forwards the signal on to the waiting fence, which allows the application to proceed. Processors waiting on the fence are unblocked.
  • FIG. 3 depicts an alternate approach and describes a process 300 for synchronization object processing. FIG. 3 is similar to FIG. 2, however step 209 has been replaced by steps 309, 311, 313, and 319. Here the signal operation (307) executed by processor 0 happens on the interception layer's signaling fence. The interception layer monitors this fence and receives the signal (309). The interception layer may perform necessary or desired updates (311), and the signal is propagated to the waiting fence (313). The signal on the waiting fence (319) is received by processor 1, unblocking it. Processor 1 is then free to continue executing other work (317).
  • According to one or more embodiments, the frame debugging process (implemented via capture and replay modes) uses a second pair of fence objects. The pair of fence objects in use while the interception layer is in running mode may be implemented by the underlying runtime/driver in such a way that “replaying” a signal value (i.e. signaling the fence with a previously used value) may lead to incorrect behavior. Also, an application's use of a fence may be incompatible with replaying a signal value. For example, the application may be designed to generate new work when a signal of a given value is received or observed by a processor. The work may only be intended to be generated once. However, if the signal's value is reused repeatedly during replay of a frame, the application may generate multiple unintended workloads. As such, the capture/replay process uses a second pair of fence objects to avoid such incompatibilities. This system provides an interception layer and frame debugger to correctly track the fence usage of an application.
  • According to one or more embodiments, when the user indicates that the interception layer should enter frame debugging (capture/replay) mode, the interception layer will internally redirect all application fence operations from the running mode signal/wait fence pair to the frame debugging pair. This may require bootstrapping the frame debugging pair by artificially signaling the fences to particular values that reflect the application's current progress. When the user indicates that the interception layer should return to running mode, the interception layer redirects all application fence operations to the original (running mode) pair of fences until the next mode change. The user can transition from running mode to frame debugging mode and back as many times as is desired.
  • Correct replay of the application's commands as recorded in function bundles may be dependent on detecting when the application has made a decision by observing the value of a fence object. According to one or more embodiments, knowing the order of application specified commands relative to the time that a fence signal completes during capture mode allows the interception layer to maintain this ordering in replay mode. In one or more embodiments, this order is maintained during replay mode by inserting an artificial function bundle into the stream of function bundles at the time the interception layer receives a signal from the signaling fence during capture mode. This is done before propagating the signal to the waiting fence so that any work dependent on the signal will be captured after the artificial function bundle has been captured.
  • According to one or more embodiments, this application specified behavior will be processed as intended with a two fence implementation in the interception layer. Additionally, when capturing one or more frames of operations, a frame debugger interception layer will be able to correctly capture the order and timing of 1) the application signaling a fence, 2) the associated processor completing the work and the fence signaling or updating its value, 3) application operations that monitor or observe the value of the fence, and 4) application operations that request that a processor wait on a fence. Additionally, depending on the API, the interception layer will be able to properly record the order of operations triggered via callbacks associated with the signaling of a fence.
  • According to one or more embodiments, the captured application specified behavior can be replayed while maintaining the same order of operations. This is possible because the interception layer knows the order of signal, monitor, and wait operations, in addition to knowing when the fence has actually been signaled. Knowing that the fence has been signaled is possible because the interception layer is always the first layer of software above the driver stack that is aware that a fence signal has completed. The interception layer notifies other layers via propagation of the signal to the waiting fence.
  • According to one or more embodiments, additional information collected during the frame capture and replay process may be used to detect improper fence usage. Knowledge of resource production and consumption by particular processors allows the frame debugger interception layer to know when synchronization must occur in order to produce correct results. Since the interception layer knows all the details about the application intended synchronization operations, it can determine if there are missing synchronization operations. For example, operations that the application should issue in order to be correct, but that the application is not currently issuing. Such a condition would be an application bug that the frame debugger interception layer is able to report to the user. In the absence of such an automatic detection mechanism, basic display of fence operations and resource operations can inform a user about improper fence usage. Additionally the frame debugger interception layer may detect situations where a fence is used unnecessarily.
  • Exemplary Computing System
  • As presented in FIG. 4, an exemplary computer system 400 upon which embodiments of the present invention may be implemented (such as the processes 200 and 300 described above) includes a general-purpose computing system environment. In its most basic configuration, computing system 400 typically includes at least one processing unit 401 and memory, and an address/data bus 409 (or other interface) for communicating information. Depending on the exact configuration and type of computing system environment, memory may be volatile (such as RAM 402), non-volatile (such as ROM 403, flash memory, etc.) or some combination of the two.
  • Computer system 400 may also comprise an optional graphics subsystem 405 for presenting information to the computer user, e.g., by displaying information on an attached display device 410. In one embodiment, the processing of one or more tasks (e.g., commands and instructions) of an application executing in computer system 400 may be performed, in whole or in part, by graphics subsystem 405 in conjunction with the processor 401 and memory 402. According to various embodiments of the present invention, a first portion of a plurality of tasks may be assigned by the application to the processor 401, with a second portion of the plurality of tasks being dependent on one or more tasks of the first portion of tasks, and being assigned to be performed by the graphics subsystem 405. In one or more embodiments, the first and second portions are assigned to two or more processors 401, two or more graphics subsystems 405, or any combination thereof.
  • Additionally, computing system 400 may also have additional features/functionality. For example, computing system 400 may also include additional storage (removable and/or non-removable) including, b t not limited to, magnetic or optical disks or tape. Such additional storage is illustrated in FIG. 4 by data storage device 407. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. RAM 402, ROM 403, and data storage device 407 are all examples of computer storage media.
  • Computer system 400 also comprises an optional alphanumeric input device 406, an optional cursor control or directing device 407, and one or more signal communication interfaces (input/output devices, e.g., a network interface card) 409. Optional alphanumeric input device 406 can communicate information and command selections to central processor 401. Optional cursor control or directing device 407 is coupled to bus 409 for communicating user input information and command selections to central processor 401. Signal communication interface (input/output device) 409, also coupled to bus 409, can be a serial port. Communication interface 409 may also include wireless communication mechanisms. Using communication interface 409, computer system 400 can be communicatively coupled to other computer systems over a communication network such as the Internet or an intranet (e.g., a local area network), or can receive data (e.g., a digital television signal).
  • Embodiments described herein provide a new approach for performing synchronization of application processing tasks and for performing debugging and data analysis of discretized and tokenized units or function bundles produced during the execution of the processing tasks. Advantages of the invention described herein provide for more efficient parallel processing while still maintaining sequential order and avoiding data hazards by using separate, non-blocking fence primitives.
  • In the foregoing specification, embodiments have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the invention, and is intended by the applicant to be the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Hence, no limitation, element, property, feature, advantage, or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims (23)

What is claimed is:
1. A method for performing application-based synchronization between two or more processors, the method comprising:
in a computing system comprising a plurality of processors, the plurality of processors comprising at least a first processor and a second processor,
performing, in the first and second processor, a first and second plurality of tasks, respectively, the first and second plurality of tasks being comprised from a sequence of commands issued by an application executing in the computing system;
suspending, via a waiting synchronization object, a performance of a third plurality of tasks in the second processor when the second plurality of tasks is completed by the second processor;
signaling a signaling synchronization object when the first plurality of tasks is completed by the first processor;
propagating a signal from the signaling synchronization object to the waiting synchronization object;
performing the third plurality of tasks in the second processor based on the propagated signal,
wherein the waiting synchronization object and the signaling synchronization object appear as a single synchronization object to the application.
2. The method according to claim 1, wherein the third plurality of tasks comprises at least one task that is dependent on a completion of at least one task of the first plurality of tasks performed by the first processor.
3. The method according to claim 1, wherein the waiting synchronization object and the signaling synchronization object are managed by an interception layer executing between the application and an API of the driver of at least one of the first and second processors.
4. The method according to claim 3, wherein the waiting synchronization object and the signaling synchronization object are generated internally in the interception layer in response to a request by the application to create a single synchronization object with both waiting and signaling functionality.
5. The method according to claim 1, wherein the waiting synchronization object comprises a waiting fence object and the signaling synchronization object comprises a signaling fence object.
6. The method according to claim 1, wherein the signaling synchronization object has a value corresponding to a state of a progress of a performance of an assigned plurality of tasks in at least one of the first and second processors.
7. The method according to claim 1, wherein the waiting synchronization object has a value corresponding to a state of a progress of a performance of an assigned plurality of tasks in at least one of the first and second processors as indicated by the signaling synchronization object after processing and propagation by an interception layer.
8. The method according to claim 7, wherein a current state of the performance of the assigned plurality of tasks in the application corresponds to the value of the waiting synchronization layer.
9. The method according to claim 1, wherein an interception layer performs an operation after the signaling synchronization object is signaled but before propagating the signal to the waiting synchronization object.
10. The method according to claim 9, wherein the operation is comprised from a group of operations consisting of:
data verification;
task verification;
data logging;
data analysis;
consistency checking; and
data profiling.
11. The method according to claim 1, further wherein the first processor is operable to perform additional tasks from the plurality of tasks after signaling the signaling synchronization object.
12. A system for frame debugging and synchronization, the system comprising:
a memory device comprising a plurality of programmed instructions;
a first processor;
a second processor;
an application executing on at least one of the first and second processors based on the programmed instructions, the application using an Application Programming Interface (API); and
an interception layer operating between the API and a driver of at least one of the first and second processors, the interception layer being configured to: generate a first signaling synchronization object and a separate first waiting synchronization object, to intercept signal commands and wait commands from the application, to apply the signal commands to the first signaling synchronization object and to propagate wait commands to the first waiting synchronization object,
further wherein the first signaling synchronization object and the first waiting synchronization object appear as a single synchronization object to the application.
13. The system according to claim 12, wherein at least one of the first and second processors is a central processing unit (CPU).
14. The system according to claim 12, wherein at least one of the first and second processors is a graphics processing unit (GPU).
15. The system according to claim 12, wherein the first signaling synchronization object comprises a signaling fence primitive and the first waiting synchronization object comprises a waiting fence primitive.
16. The system according to claim 12, wherein the interception layer is further configured to apply at least one of: a signal operation to the first signaling synchronization object, a query operation to the first waiting synchronization object, and a wait operation to the first waiting synchronization object, based on a command from the application.
17. The system according to claim 12, further comprising:
a first value corresponding to a state of progress of the application in submitting the first and second plurality of tasks to be performed by the first and second processors;
a second value corresponding to a value of the first signaling synchronization object;
a second value corresponding to the first waiting synchronization object;
and a third value corresponding to the state of progress perceived by the application for performed tasks of the first and second plurality of tasks.
18. The system according to claim 17, further wherein the first value is indicative of a state of progress of a performance of a plurality of tasks in at least one of the first and second processors, the second value corresponds to the state of progress indicated by the first value and propagated by the interception layer to the first waiting synchronization object, and the third value corresponds to a state of progress of the performance of the plurality of tasks as perceived by the application and is based on the second value.
19. The system according to claim 12, wherein the interception layer is further configured to generate a second signaling synchronization object and a second waiting synchronization object, and to record a plurality of parameters and a state of a performance of a plurality of tasks by redirecting commands intended for the first signaling synchronization object to the second signaling synchronization object and commands intended for the first waiting synchronization object to the second waiting synchronization object.
20. The system according to claim 19, wherein the interception layer is further configured to replay the recorded plurality of parameters and the state of the plurality of tasks based on user input.
21. A method for performing application-based frame debugging, the method comprising:
in a computing system comprising a first processor and a second processor,
generating a first and second pair of synchronization objects, the first pair of synchronization objects comprising a first signaling synchronization object and a first waiting synchronization object, the second pair of synchronization objects comprising a second signaling synchronization object and a second waiting synchronization object;
performing a first portion of a plurality of tasks in the first and second processors using the first pair of synchronization objects to ensure an order of the performance of the first portion of the plurality of tasks;
entering a frame debugging mode based on user input;
performing a second portion of a plurality of tasks in the first and second processors using the second pair of synchronization objects to ensure an order of the performance of the second portion of the plurality of tasks by redirecting signal commands intended for the first signaling synchronization object to the second signaling synchronization object and propagating the redirected signal commands intended for the first waiting synchronization object to the second waiting synchronization object;
recording a state of the application and a plurality of parameters between signal commands intended for the first signaling synchronization object are redirected to the second signaling synchronization object and propagating the wait commands intended for the first waiting synchronization object to the second waiting synchronization object; and
exiting the frame debugging mode based on user input.
22. The method according to claim 21, wherein the recording a state of the application and a plurality of parameters comprises:
replaying the recording in response to a received user input.
23. The method according to claim 21, wherein the recording a state of the application and a plurality of parameters comprises at least one operation from the group of operations consisting of:
analyzing a performance of the first plurality of tasks;
generating a profile based on the analyzed performance;
outputting the profile based on user input;
determining an absence of a synchronization operation; and
determining a presence of unnecessary synchronization operations.
US14/845,123 2015-08-07 2015-09-03 Method and apparatus for interception of synchronization objects in graphics application programming interfaces for frame debugging Active 2035-11-24 US9910760B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/845,123 US9910760B2 (en) 2015-08-07 2015-09-03 Method and apparatus for interception of synchronization objects in graphics application programming interfaces for frame debugging

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201562202743P 2015-08-07 2015-08-07
US14/845,123 US9910760B2 (en) 2015-08-07 2015-09-03 Method and apparatus for interception of synchronization objects in graphics application programming interfaces for frame debugging

Publications (2)

Publication Number Publication Date
US20170039124A1 true US20170039124A1 (en) 2017-02-09
US9910760B2 US9910760B2 (en) 2018-03-06

Family

ID=58052624

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/845,123 Active 2035-11-24 US9910760B2 (en) 2015-08-07 2015-09-03 Method and apparatus for interception of synchronization objects in graphics application programming interfaces for frame debugging

Country Status (1)

Country Link
US (1) US9910760B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180033115A1 (en) * 2016-07-29 2018-02-01 Microsoft Technology Licensing, Llc Capturing Commands in a Multi-Engine Graphics Processing Unit
US20190272206A1 (en) * 2018-03-02 2019-09-05 Microsoft Technology Licensing, Llc Techniques for detecting faults in rendering graphics
US11023364B2 (en) * 2015-05-12 2021-06-01 Suitest S.R.O. Method and system for automating the process of testing of software applications

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11036524B1 (en) * 2017-07-18 2021-06-15 FullStory, Inc. Capturing and processing interactions with a user interface of a native application

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020199173A1 (en) * 2001-01-29 2002-12-26 Matt Bowen System, method and article of manufacture for a debugger capable of operating across multiple threads and lock domains
US20080114937A1 (en) * 2006-10-24 2008-05-15 Arm Limited Mapping a computer program to an asymmetric multiprocessing apparatus
US20100211933A1 (en) * 2009-02-19 2010-08-19 Nvidia Corporation Debugging and perfomance analysis of applications
US20110078427A1 (en) * 2009-09-29 2011-03-31 Shebanow Michael C Trap handler architecture for a parallel processing unit
US7958497B1 (en) * 2006-06-07 2011-06-07 Replay Solutions, Inc. State synchronization in recording and replaying computer programs
US20120081378A1 (en) * 2010-10-01 2012-04-05 Jean-Francois Roy Recording a Command Stream with a Rich Encoding Format for Capture and Playback of Graphics Content
US20120081376A1 (en) * 2010-10-01 2012-04-05 Sowerby Andrew M Graphics System which Utilizes Fine Grained Analysis to Determine Performance Issues
US20130091494A1 (en) * 2011-10-11 2013-04-11 Andrew M. Sowerby Suspending and Resuming a Graphics Application Executing on a Target Device for Debugging
US20130159780A1 (en) * 2011-12-16 2013-06-20 Ryan D. Bedwell Correlating traces in a computing system
US20130185346A1 (en) * 2011-12-07 2013-07-18 Apple Inc. Proofing electronic publications on portable devices
US20140146062A1 (en) * 2012-11-26 2014-05-29 Nvidia Corporation System, method, and computer program product for debugging graphics programs locally utilizing a system with a single gpu
US20140168231A1 (en) * 2012-12-18 2014-06-19 Nvidia Corporation Triggering performance event capture via pipelined state bundles
US20140189544A1 (en) * 2012-12-27 2014-07-03 Nvidia Corporation Web-based graphics development system and method of graphics program interaction therewith
US20140372990A1 (en) * 2013-06-12 2014-12-18 Nvidia Corporation Method and system for implementing a multi-threaded api stream replay
US9195829B1 (en) * 2013-02-23 2015-11-24 Fireeye, Inc. User interface with real-time visual playback along with synchronous textual analysis log display and event/time index for anomalous behavior detection in applications
US20160011955A1 (en) * 2014-07-09 2016-01-14 Microsoft Corporation Extracting Rich Performance Analysis from Simple Time Measurements

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020199173A1 (en) * 2001-01-29 2002-12-26 Matt Bowen System, method and article of manufacture for a debugger capable of operating across multiple threads and lock domains
US7958497B1 (en) * 2006-06-07 2011-06-07 Replay Solutions, Inc. State synchronization in recording and replaying computer programs
US20080114937A1 (en) * 2006-10-24 2008-05-15 Arm Limited Mapping a computer program to an asymmetric multiprocessing apparatus
US20100211933A1 (en) * 2009-02-19 2010-08-19 Nvidia Corporation Debugging and perfomance analysis of applications
US9256514B2 (en) * 2009-02-19 2016-02-09 Nvidia Corporation Debugging and perfomance analysis of applications
US8522000B2 (en) * 2009-09-29 2013-08-27 Nvidia Corporation Trap handler architecture for a parallel processing unit
US20110078427A1 (en) * 2009-09-29 2011-03-31 Shebanow Michael C Trap handler architecture for a parallel processing unit
US20120081378A1 (en) * 2010-10-01 2012-04-05 Jean-Francois Roy Recording a Command Stream with a Rich Encoding Format for Capture and Playback of Graphics Content
US20120081376A1 (en) * 2010-10-01 2012-04-05 Sowerby Andrew M Graphics System which Utilizes Fine Grained Analysis to Determine Performance Issues
US20130091494A1 (en) * 2011-10-11 2013-04-11 Andrew M. Sowerby Suspending and Resuming a Graphics Application Executing on a Target Device for Debugging
US9298586B2 (en) * 2011-10-11 2016-03-29 Apple Inc. Suspending and resuming a graphics application executing on a target device for debugging
US20130185346A1 (en) * 2011-12-07 2013-07-18 Apple Inc. Proofing electronic publications on portable devices
US20130159780A1 (en) * 2011-12-16 2013-06-20 Ryan D. Bedwell Correlating traces in a computing system
US20140146062A1 (en) * 2012-11-26 2014-05-29 Nvidia Corporation System, method, and computer program product for debugging graphics programs locally utilizing a system with a single gpu
US9292414B2 (en) * 2012-11-26 2016-03-22 Nvidia Corporation System, method, and computer program product for debugging graphics programs locally utilizing a system with a single GPU
US20140168231A1 (en) * 2012-12-18 2014-06-19 Nvidia Corporation Triggering performance event capture via pipelined state bundles
US20140189544A1 (en) * 2012-12-27 2014-07-03 Nvidia Corporation Web-based graphics development system and method of graphics program interaction therewith
US9195829B1 (en) * 2013-02-23 2015-11-24 Fireeye, Inc. User interface with real-time visual playback along with synchronous textual analysis log display and event/time index for anomalous behavior detection in applications
US20140372990A1 (en) * 2013-06-12 2014-12-18 Nvidia Corporation Method and system for implementing a multi-threaded api stream replay
US9477575B2 (en) * 2013-06-12 2016-10-25 Nvidia Corporation Method and system for implementing a multi-threaded API stream replay
US20160011955A1 (en) * 2014-07-09 2016-01-14 Microsoft Corporation Extracting Rich Performance Analysis from Simple Time Measurements

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
André Alexandre Wang Liu, "3D Application Debugging", Minho’s University, December 2014, pp. 1-101; <https://repositorium.sdum.uminho.pt/bitstream/1822/37267/1/eeum_di_dissertacao_pg22841.pdf> *
Podila et al., "A Visualization Tool for 3D Graphics Program Comprehension and Debugging", IEEE, October 2016, pp. 111-115;<http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7780167> *
William B. Langdon, "Debugging CUDA", ACM, GECCO’11, July 2011, pp. 415-422;<https://dl.acm.org/citation.cfm?id=2002028&CFID=993336535&CFTOKEN=51222302> *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11023364B2 (en) * 2015-05-12 2021-06-01 Suitest S.R.O. Method and system for automating the process of testing of software applications
US20180033115A1 (en) * 2016-07-29 2018-02-01 Microsoft Technology Licensing, Llc Capturing Commands in a Multi-Engine Graphics Processing Unit
US10198784B2 (en) * 2016-07-29 2019-02-05 Microsoft Technology Licensing, Llc Capturing commands in a multi-engine graphics processing unit
US20190272206A1 (en) * 2018-03-02 2019-09-05 Microsoft Technology Licensing, Llc Techniques for detecting faults in rendering graphics
US10733076B2 (en) * 2018-03-02 2020-08-04 Microsoft Technology Licensing, Llc Techniques for detecting faults in rendering graphics

Also Published As

Publication number Publication date
US9910760B2 (en) 2018-03-06

Similar Documents

Publication Publication Date Title
US9658942B2 (en) Dynamic tracing framework for debugging in virtualized environments
US6901581B1 (en) Method for software debugging via simulated re-execution of a computer program
US8756572B2 (en) Debugger-set identifying breakpoints after coroutine yield points
US20150161025A1 (en) Injecting Faults at Select Execution Points of Distributed Applications
US9477575B2 (en) Method and system for implementing a multi-threaded API stream replay
JP3571976B2 (en) Debugging apparatus and method, and program recording medium
US9355003B2 (en) Capturing trace information using annotated trace output
US20120096205A1 (en) Inter-virtual machine profiling
US9910760B2 (en) Method and apparatus for interception of synchronization objects in graphics application programming interfaces for frame debugging
US10452513B2 (en) Trace data capture device and method, system, diagnostic method and apparatus and computer program
WO2019169757A1 (en) Test control and test execution device and method, and computer storage medium
US10474565B2 (en) Root cause analysis of non-deterministic tests
KR102025078B1 (en) Diagnosing code using single step execution
US10698805B1 (en) Method and system for profiling performance of a system on chip
US20080066056A1 (en) Inspection Apparatus, Program Tampering Detection Apparatus and Method for Specifying Memory Layout
US20190057012A1 (en) Diagnostics of state transitions
CA2953787A1 (en) Automated root cause analysis of single or n-tiered applications
US9792402B1 (en) Method and system for debugging a system on chip under test
US8296738B1 (en) Methods and systems for in-place shader debugging and performance tuning
US9454461B1 (en) Call stack display with program flow indication
US7823019B2 (en) Debug circuitry
US8276129B1 (en) Methods and systems for in-place shader debugging and performance tuning
US9274875B2 (en) Detecting memory hazards in parallel computing
US10229033B2 (en) System, method and apparatus for debugging of reactive applications
US9841960B2 (en) Dynamic provision of debuggable program code

Legal Events

Date Code Title Description
AS Assignment

Owner name: NVIDIA CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KIEL, JEFFREY;PRICE, DAN;STRAUSS, MIKE;REEL/FRAME:036491/0771

Effective date: 20150820

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4