API Change Proposal for Sikuli 0.10
Goal
To make Sikuli Script's API be more consistent and easier to use, we propose a new design of the API that will be used in the 0.10 releases.
Major Changes
- Global functions, Subregion's methods, and Match's methods will be consolidated to form a consistent interface.
- Within a with region block, all Sikuli functions are redirected to the specified region.
- For a region object, an observation loop can be set up to allow visual event driven programming.
- find() will support PatternClass, a class of patterns, which can be faces, buttons, windows, drop-down boxes, etc.
- The number of returned matches will not associate with Pattern anymore. find() always returns one Match (the best match).
- findAll() will return an iterator of Match. Matches will be replaced by Iterator<Match>.
- find.region and find.regions will be replaced by region.getLastMatch() and region.getLastMatches(), respectively. After a findAll() is executed successfully, the matched objects (Iterator<Match>) will be saved in its region that can be retrieved by region.getLastMatches(). Similarly, once a find() or a wait() is executed successfully, the best match object will be saved in its region as region.getLastMatch(). These two attributes ARE set if a find() is implicitly executed in actions or if a find() is failed.
- All action commands will take only one target. Actions on multiple targets can be done with a for loop. We want to keep Sikuli's core commands as simple as possible. Since xxxAll commands are rarely used and increase the complexity of Sikuli API, we decide to remove all of these commands.
Improvements
- No more temporary screen shot files. Screen shots will be taken and processed directly in memory. This speeds up everything.
- It will be easier to search image patterns within a subregion of screen or on different monitors (if any).
In the following more Information on the Details
Consolidate Subregion and global commands into a single Region class and have a new Screen class
- The Subregion class will be renamed to Region.
- There will be a Screen class that inherits Region and is predefined to cover each screen. Multi-screen environments will be supported.
- By default, there is a global Screen object instance called "screen" which is set to cover the default (primary) screen. Users can call its methods without explicitly naming it. For example, instead of screen.find(), users can also write find() to achieve the same thing. In addition, users can call setROI(rectangle)/setRect(rectangle) to tell Sikuli to look only into that rectangle on the screen. This effectively "shrinks" the screen.
- The following code explains how the new system works:
find(IMG) # finds IMG on the default screen, say (0,0)-(1280,800) screen.find(IMG) # is equivalent to the above line setROI(100,300,100,100) # set the region of interests to (100,300)-(200,400) find(IMG) # finds IMG on the default screen with a limited region of interests (100,300)-(200,400) Region(0,0,100,100).find(IMG) # find IMG within the region (0,0)-(100,100)
- If one wants to work on multiple regions, he/she may create many Region objects and write lots of r1.find(), r1.click(), r2.find(), r2.rightClick(), etc. Sikuli would simplify this scenario by using the with-statement (a new syntax only in Python 2.5+). In a with block, all global Sikuli functions are overwritten by the given Region's methods that work only within the specified region.
find(IMG) # finds IMG on the default screen with Region(0,0,500,500): find(IMG) # finds IMG within the region (0,0)-(500,500) instead of on the whole screen click(IMG) # click on IMG within the region (0,0)-(500,500) instead of on the whole screen find(IMG) # finds IMG on the default screen
Spatial Operators
| above | ||
| left | inside | right |
| below |
A Region has six spatial operators for specifying another region that is related to it.
- inside(): returns itself. Just a syntax sugar that makes scripts more readable.
- nearby( [range] ): returns the rectangular region extended with range pixels in each direction. The default range is 50 pixels.
- the following spatial operators return a rectangular region between the corresponding border of the region and either width or height of range pixels. If range is omitted, the border of the new region is the corresponding border of the screen. The other dimension of the new region is the same as of the region.
- above( [range] )
- right( [range] )
- left( [range] )
- below( [range] )
Observer (Visual-event-driven programming)
We use the word "observer" rather than "listener" because it is natural to say "sikuli observes visual events"
- each Region object can have one observer running (observation loop)
- since you can define as many regions as you want, there is no limit on concurrently running observers
- an observer can run in foreground (main script waits) or in background (main script continues)
- each observer observes one or more visual events out of 3 different types (onAppear(), onVanish(), onChange())
- each visual event specifies a handler, that is called when the event happens
- during processing of a handler, the corresponding observation loop waits for the handler to return
- exchange of information between main script and handlers can be implemented by using global variables
- when an observation loop ends, all corresponding visual events are no longer observed
- How to end an observation loop?
- observe([time=decimal]) # observation stops after decimal seconds
- region.stopObserver() # observation on Region is stopped (e.g. in main script if observation is running in background)
- event.region.stopObserver() # inside a handler: observation on containing region is stopped
r1 = Region(...) r2 = Region(...) def handler1(event): print "p1 appears" event.region.stopObserver() # stops the observation on r1 if p1 appears, the same as r1.stopObserver() r1.onAppear(p1, handler1) r1.onVanish(p2, handler2) r1.observe(background=True) # runs in background - main script continues r2.onChange(handler3) r2.observe() # runs in foreground - main script waits
Finder ( Iterator<Match>, findAll() )
Sikuli 0.10 will provide new APIs that return an iterator of Match instead of an array of Match (Matches). The basic template matching finder in the Java layer is defined as following.
class Finder implements Iterator<Match>{ public Finder(String screenFilename); // a user can provide the image of screen in which find() looks for the given templates public void find(String templateFilename, double minSimilarity=0.0); // finds the template image within the given screen image with minimal similarity = minSimilarity. public boolean hasNext(); // checks if there are more matches public Match next(); // gets the next match public void remove(); // do nothing }
Jython's Finder will simply wrap up this Java Finder.
- findAll() returns an iterator that needs to be destroyed by the user
One method is to call iterator.destroy() manually.The other way is to use a with statement as following, and the iterator will be destroyed automatically.matches = findAll(IMG) # do something on matches matches.destroy() # this needs to be called by the user
with findAll(IMG) as matches: # a with statement can destroy the iterator automatically pass # do something on matches # no need to destroy matches.
Action Functions
- Each action function takes 5 types of parameters as a target: Pattern, String(path to an image), Region, Match, and Location (PSRML). The function dragDrop takes all combinations of these 5 types of parameters.
- Each action function acts on a single target. To act on multiple targets, a user should use a loop, e.g. for x in findAll(...): click(x).
New API Specifications
The following PSRMLs are shorthands for Pattern/String/Region/Match/Location. Using Pattern and String in an action causes an implicit find() before executing the action.
- class Region
- Region( x, y, w, h ) # constructor to create a Region
- Region( Region ) # constructor to create a Region
- setX(), setY(), setW(), setH(), setRect(), setROI() # setROI is an alias of setRect but making its function more clear
- getX(), getY(), getW(), getH(), getRect(), getROI()
- getScreen() # returns the Screen that this region belongs to
- getCenter() # returns the Location of the center of this region
- Spatial Operators: nearby( [range] ), inside(), right( [range] ), left( [range] ), above( [range] ), below( [range] )
- find( Pattern/String/PatternClass ) # returns the best match as a Match object and sets region.getLastMatch()
- findAll( Pattern/String/PatternClass ) # returns all matches as Iterator<Match> and sets region.getLastMatches()
- wait( [Pattern/String/PatternClass], [seconds] ) # change from milliseconds to seconds and sets region.getLastMatch()
- exists([Pattern/String/PatternClass], [seconds]) # checks if the target exists. The same as wait(), but does not throw a FindFailed exception if nothing is found.
- waitVanish( [Pattern/String/PatternClass], [seconds] ) # returns true if the given target vanishes, otherwise returns false.
- click( PSRML, [modifiers])
- rightClick( PSRML, [modifiers])
- doubleClick( PSRML, [modifiers])
- hover( PSRML )
- dragDrop( PSRML, PSRML, [modifiers] )
- drag( PSRML ) # to use in more sophisticated drag-drop actions together with mouseMove()
- dropAt( PSRML, [delay]) # to use in more sophisticated drag-drop actions together with mouseMove()
- type( [PSRML], text, [modifiers] )
- paste( [PSRML], text )
- keyDown(key/list of keys) # press the given keys
- keyUp([key/list of keys]) # release the given keys. If nothing is given, release all pressed keys.
- mouseDown( button ) # low-level mouse function: press and hold button
- mouseUp( [button] ) # low-level mouse function: release the given button. If nothing is given, release all pressed buttons.
- mouseMove( PSRML )
- onAppear(Pattern/String/PatternClass, handler) # invokes the handler if the given pattern appears.
- onVanish(Pattern/String/PatternClass, handler) # invokes the handler if the given pattern vanishes.
- onChange(handler) # invokes the handler if the anything changes in this region.
- observe([time=seconds], [background=False]) # enters the observation loop. background=True: observe() will run in background. time > 0: observe() stops after time seconds, otherwise, it runs until stopObserver() is called.
- stopObserver() # stops the observation loop --- foreground observation doesn't stop unless the user explicitly calls stopObserver() in a handler
- supports with Region(): See above.
- class Screen extends Region
- Screen() # returns the default screen
- Screen(id) # returns the screen specified by id
- static getNumberScreens() # returns the number of screens
- getBounds() # returns the bounds of this screen
- static getBounds(id) # return the bounds of the screen specified by id. The same as Screen(id).getBounds(), but this doesn't create a dummy Screen instance.
- only available in Java.
- String capture(Region), capture( x, y, w, h ) # captures the specified region. returns the file name to the captured image.
- String capture() # interactive capture
- Region selectRegion() # asks the user to specify a rectangular region on the screen and returns it as a Region.
- supports with Screen([id]): # Screen is a Region, therefore with also works on any Screen instance.
- class Match extends Region
- getScore()
- getTarget() # returns a Location ( = getCenter()+Pattern.target() ), same as getCenter(), if find() not based on a Pattern with a target
- class Pattern
- Pattern(String) # constructor, String(path to an image)
- similar(similarity) # returns a new Pattern that has the specified similarity
- exact() # # returns a new Pattern whose similarity = 1.0
- targetOffset(dx,dy) # returns a new Pattern with specified target offset (the offset coordinate of an action's target, which is relative to the center of the pattern).
- class Location
- Location(x, y) # constructor
- Location(location) # constructor
- getX(), getY()
- offset(dx, dy) # returns a new Location(x+dx, y+dy)
- left(dx) # returns a new Location(x-dx,y)
- right(dx) # returns a new Location(x+dx,y)
- above(dy) # returns a new Location(x,y-dy)
- below(dy) # returns a new Location(x,y+dy)
- additional built-in functions (no change)
- openApp( name of an application )
- switchApp( name of an application )
- closeApp( name of an application )
- String input( [message] )
- exit()
- class VDict
- no change
- class Env
- static getOS() # returns OS.MAC, OS.WINDOWS, OS.LINUX
- static getMouseLocation() # returns the Location of the mouse cursor
- class Constants
- FOREVER # can be used in observe(), wait(), waitVanish(), exists()
Thanks to the invaluable comments and ideas from RaiMan (Raimund Hocke), Christian Tismer, and C K Kashyap.