490 likes | 597 Views
HaRe – hebrew HAndwriting REader. What exactly HaRe is:. HaRe is a hand writing recognition engine, it transforms image files into text files containing (hopefully) the text written in the image file.
E N D
What exactly HaRe is: • HaRe is a hand writing recognition engine, it transforms image files into text files containing (hopefully) the text written in the image file. • HaRe was written in C and is known to run on most UNIX-compatible operating systems (including windows under cygwin).
What are HaRe’s limitations • So far, HaRe is known to handle letters between bet and reish. This means the following letters will NOT be recognized: א, ת, ך, ם, ן, ף, ץ. • HaRe can only handle the pgm file format. • HaRe can only handle black & white images (not grayscale). • HaRe works in the “ktav” typeface and not in the “dfus” typeface. • HaRe runs from the command line (there’s a GUI front end but it’s very basic).
How does HaRe work • HaRe works in a few phases: Image file (PGM) blackelems Image loader Image seperation filtering blackelems tokens Reader pass 1 Reader pass 2 tokens with letter info Unicode (UTF-8) hebrew text Hebrew encoder
“Outer” phases Image file (PGM) blackelems Image loader Image Seperation Filtering blackelems tokens Reader pass 1 Reader pass 2 tokens with letter info Unicode (UTF-8) hebrew text Hebrew encoder
“Outer” phases • The first phase is passing the PGM file to be interpreted by the “Image Loader”, this is needed in order for the image to be stored in HaRe’s internal format (and in memory). • The last phase is translating HaRe’s internal hebrew encoding into the popular UTF-8 encoding. • The purpose of both of this phases is to allow the HaRe engine to communicate with the rest of the world (both in input and output). • Both these phases are quite simple (only 1.71% of the code) and will not be further discussed.
“Inner” phases - seperation Image file (PGM) blackelems Image loader Image separation filtering blackelems tokens Reader pass 1 Reader pass 2 tokens with letter info Unicode (UTF-8) hebrew text Hebrew encoder
Separation • An image is one big collection of pixels, we want to separate that connection into parts (like letters, lines etc.) that’s where separation comes in:(note: words are a more complex way of dividing our input, and are therefore handled in a later phase)
Separation • find_piclines – Seperates the image into lines, works by running from the topmost image row to the bottommost row, groups of adjacent rows which contain black pixels are considered to be part of the same line. • copy_linebes – Each line is then cut into it’s blackelems, a blackelem can be defined as following: “A collection of pixels P is a blackelem if and only if all neighbours of P are white, all pixels of P are black, and for each pixel pair (p1,p2) in P there’s a path from p1 to p2 that goes only through pixels in P”. Separation blackelems ordered by line. Image lines find_piclines copy_linebes
“Inner” phases - filtering Image file (PGM) blackelems Image loader Image Separation Filtering blackelems tokens Reader pass 1 Reader pass 2 tokens with letter info Unicode (UTF-8) hebrew text Hebrew encoder
Filtering • Our input is created by a handwriting, which may contain little “distrubances” that may confuse our algorithms. • In our filtering we address two problems:1) Stain filtering.2) Problematic nvbr-forms.
What are stains? • Stains refer to small blackelems which are not letters and happen to be in the image (usually the result of someone leaving a black dot somewhere, or not fully erasing a badly written letter).
What are nvbr-forms? • nvbr-forms refer to geometric forms that look like the letters n, v (hence the nv) or the signs (, ) (hence the br which stands for bracket). • nvbr-forms are used a lot in the reader phases and therefore we should get rid of nvbr-forms that are not important and may confuse us (usually, those who are small).
Filtering • Stain checking in copy_linebes – blackelems which appear to be too small, or unproprtional are automatically dropped from the blackelems list given by copy_linebes. • remove_small_nvbrs – gets rid of small nvbr forms which may confuse our algorithms. • Is_stainline – in some cases, a line of stains may escape the first check and then be indentified as a line of vavs, this check makes sure that lines are not too small. Filtering blackelems stain checking clauses in copy_linebes blackelems remove_small_nvbrs Is_stainline
Filtering • Stain checking in copy_linebes – blackelems which appear to be too small, or unproprtional are automatically dropped from the blackelems list given by copy_linebes. • remove_small_nvbrs – gets rid of small nvbr forms which may confuse our algorithms. • Is_stainline – in some cases, a line of stains may escape the first check and then be indentified as a line of vavs, this check makes sure that lines are not too small. Filtering blackelems stain checking clauses in copy_linebes blackelems remove_small_nvbrs Is_stainline
“Inner” phases – Reader pass 1 Image file (PGM) blackelems Image loader Image Separation Filtering blackelems tokens Reader pass 1 Reader pass 2 tokens with letter info Unicode (UTF-8) hebrew text Hebrew encoder
Reader pass 1 • This pass has two main jobs:1) Attempt to identify all letters whose form does not depend on other letters (i.e. all letters but yod, vav)2) Transform blackelems into tokens. • Note: This pass will also have the side job of compensating for the increase in average height caused by letters like kuf (by specifying an override_height and override_bottom which will be used in average calculations instead).
Reader pass 1 • blackelems are first sent to the letter checking routines, these give “opinions” on what the letter is. The routines is_hey and is_kuf turn on the ignore_next flag in case they believe they found the letter they search for. The tokenizer works in two modes: If it got ignore_next = 0 then the token is composed of a single blackelem, otherwise, it’s composed of two (current and next). Reader pass 1 is_bet... is_reish Result (which letters we found) ignore_next tokens tokenizer blackelems blackelems
“Inner” phases – Reader pass 2 Image file (PGM) blackelems Image loader Image Separation Filtering blackelems tokens Reader pass 1 Reader pass 2 tokens with letter info Unicode (UTF-8) hebrew text Hebrew encoder
Reader pass 2 • This pass has two jobs:1) Find word boundaries, and for each word calculate average height an average bottom.2) Attempt to identify all letters whose form does depend on other letters (yod, vav, and in the future noon sofit)
Reader pass 2 • find_wordinfo is responsible for finding word boundaries, it works by running on the tokens, and checking if a token’s distance from it’s right neighbor (called space) is much bigger than the average of spaces calculated so far in the word. After find_wordinfo finds a word, it calculates average token height and average bottommost black token row for the word. • Afterwards, the tokens are passed along with the average info to is_vav/is_yod who will then be able to determine the difference between vav and yod by comparing token height/bottom against the averages. Reader pass 2 Tokens with full letter information is_vav is_yod is_noonsofit Word bounds tokens find_wordinfo
A more detalied look at the reading phases. • So far I have given a general outline of how HaRe works. In this part of the presenation I would like to explain more about the reading phases, which are the most important and complex phases in HaRe (more than 50% of the code).
Lower layer: Core You are here
Lower layer: Core • This layer is designed to handle the most primitive parts of an image (Usually nothing greater than an image row or column). • Some routines in this layer: check_hseq, check_vseq, check_seqcheck_horz_line, check_vert_lineget_right_obstacle, get_left_obstacle, and more…
Lower layer: Core - example • We will now review an example of a routine:blackseq_list * check_horz_line(image *img,int left,int right,int row)This routine runs on the image row given as a parameter, starting from the column given as ‘left’ to to the column given as ‘right’.This routine returns the number of blackseqs in the line, where they are, and what is their length (a blackseq is a run of black pixels, a good analogy would be to say that blackseqs are for an image row the same as blackelems are for the entire image).
Middle layer: Core You are here
Middle layer: Form • This layer is designed to handle more complex structures in an image, known as forms. Three examples of forms are:shpitz-form, nvbr-form, stomp-form.
What are shpitz-forms? • shpitz-forms refer to geometric forms that look more or less like a straight or a curved line. It should be noted that shpitz-forms in HaRe are calculated by walking on the form in one of four directions (left, right, up, down), and therefore diagonals are harder. (In the future it may be a good idea to change the shpitz algorithms to change their direction based on the form).
Middle layer: Form - example • We will now review an example of a routine:int check_shpitz_form(image *img,int left,int right,int top,int bottom,shpitzinfo *y,int direction,const shpitzchecks *sc)This routine starts walking on the image rectangle specified by parameters ‘right’, ‘left’, ‘top’ and ‘bottom’. And search for a shpitz that goes in the direction specified by ‘direction’ (either LEFT, RIGHT, DOWN, UP) and satisfies all condition given by the parameter ‘sc’.The routine returns 0 if the shpitz wasn’t found, a non zero return value means the shpitz was found. More information on the shpitz is returned via ‘y’.
Middle layer: Form - example • It is quite clear the ‘sc’ plays an important part here, so let’s take a look at some of the checks it may contain.
Middle layer: Form - example • minlen_abs – this check dictates the a shpitz must be longer than minlen_abs pixels. • rowgroup_angle_max_dif – this check looks at the shpitz as a collection of “rowgroups” (or in some cases colgroups), each rowgroup contains a number of rows specified by rowgroupsize, and undergoes an angle calculation. A sequence of rowgroups is considered to be part of the shpitz as long as the difference between angles is not greater than rowgroup_angle_max_dif. This check is useful for detecting curvature in shpitzes (think of noon vs. reish). It should be noted there’s another similar check, rowgroup_angle_min_dif.
What are nvbr-forms? • nvbr-forms refer to geometric forms that look like the letters n, v (hence the nv) or the signs (, ) (hence the br which stands for bracket). • nvbr-forms are used a lot in the reader phases and therefore we should get rid of nvbr-forms that are not important and may confuse us (usually, those who are small).
Middle layer: Form - example • We will now review an example of a routine:int check_nvbracket_form(image *img,int col,int row,int direction,nvforminfo *nvfi,inf flags,fnvforminfo *fnvfi)This routine starts walking on the image from the point given by ‘col’, ‘row’ and in the direction given by ‘direction’. While it walks, it constantly checks whether we are blocked by two blackseqs, it tries to walk and walk until the two blackseqs unite into one (thus blocking us from walking) or until we find a sudden “change” in the blackseqs. It should be noted that if the starting point was not blocked between two blackseqs, this routine will leave immediately, with an error.This routine can also act differently according to the given ‘flags’, it returns info on ‘nvfi’, and in some cases (rare) modifies the ‘fnvfi’ structure, the int value returned by the routine is 0 or non zero, depending on whether we didn’t find or find an nvbr-form.This routine is frequently called by another routine, find_nv_form, who’s purpose is to be able to find an nvbr form without starting with a point that’s blocked by two blackseqs in one of the nvbr form rows (or cols), in other words, without getting a point that’s already “inside” the nvbr form. Modifying ‘fnvfi’ is usually find_nv_form’s job.
Middle layer: Form - example • Let’s have a look at some of the flags: • STOP_IF_NOT_ALONE_TOP – In some cases, we want to make sure that our nvbr form doesn’t have anything above/below (or to the left/right, in case of a vertical direction) of it. This flag tells the routine to stop with an error when it sees this kind of a requirement violated. TOP can be replaced with LEFT, RIGHT, BOTTOM, as needed. (In our example, direction is LEFT, and we use STOP_IF_NOT_ALONE_TOP).
Middle layer: Form - example • IGNORE_RIGHT_INNER_U – Understanding this flag properly requires the reader to understand that the flag was originally designed with the direction UP in mind. An inner U is an nvbr that goes in the opposite direction of our nvbr that is within the area covered by the nvbr and will cause a normal operation of check_nvbr_form to fail and say there’s no nvbr. If this flag is on, and the nvbr is happening in the right side of the nvbr form, then it will still be recognized as an nvbr. In the example below, if we do not use this flag, then the routine will fail, otherwise, if we use the flag, it will succeed.
What are stomp-forms? • Last (and least) are stomp-forms. The name stomp-form refers to a collection of adjacent blackseqs. This definition is very similar to shpitz, but comes with less reqiurements (and therefore less complexity). Due to that similarity, it is possible that in the future these two definitions will be combined. Most of the image’s features can be seen as stomps, henceforth this form is usually not used.
When are stomps used • A good example for the usage of stomps is the following:Let’s say we found an nvbr form, the check_nvbr_form routine stops at the first blackseq that acts as a unification of the nvbr’s two forms.In the example given above, using the shpitz routines could probably achieve the same check, but would be more complex (due to the many checks that can be done/undone on a shpitz).
Upper layer: is_??? routines You are here
Upper layer: is_??? routines • This layer contains routines of the form is_???(parameters) that return zero if the blackelem image (or two images in case of hey/kuf) isn’t the letter ???, one otherwise. (??? May be replaced by bet, geemel, daled, …).
Upper layer: is_??? routines - example • We will now review an example of a routine:int is_otmem(image *img,rect re,tokeninfo *ti) This routine works on the image of a blackelem given as a parameter. It recieves additional info which was gathered earlier on the blackelem using a tokeninfo structure ‘ti’ (currently this info includes the estimated thickness of the pen used to draw the blackelem, and the position of the blackelem in the original image).The routine returns 0 if the blackelem is not a mem (מ), non zero if it is.Note: rect re – this parameter belongs to the past days where this routine worked on the entire image, but only on a specific rect within it, currently this parameter is a dummy value set to contain the entire rectangle of the blackelem image.Note: In the code, we refer to mem as otmem to prevent confusion, since mem is usually used as an acronym of MEMory. • We will take a real mem and see what checks the routines runs on it.
Upper layer: is_??? routines - example • The first thing is_otmem does is to call find_nv_form twice to find the two nvbr forms of the letter. Afterwards, we check and make sure that the nvbr forms are big enough (This check compares size to the pen’s thickness). (Since they are vertical, it means a check of their height, more accurately the number of rows “inside” them).
Upper layer: is_??? routines - example • Afterwards is_otmem starts checking the area of the right side of the v-form above the topmost row which contains both of the v-form’s sides. It checks that the blackelem in there is a the appropriate shpitz form. A similar check is also done on the n-form. The shpitz check is done by calling check_otmemish_head which is a wrapper that sets up a shpitzchecks struct (‘sc’) parameter for check_shpitz_form and calls it. Afterwards we run a size check similar to the previous one, but this time including both the nvbr form’s two sided area, and the shpitz (Unlike the previous size check, this one runs against the total height of the blackelem).
Upper layer: is_??? routines - example • Finally, a check is done on the horizontal distance between the two “peaks” of the nvbr forms. We make sure that it’s not too small (in comparison to the width of the entire blackelem).
Upper layer: is_??? routines - example • If all these check succeded, then the blackelem is considered to be the letter mem (מ).