KAUNAS UNIVERSITY OFTECHNOLOGY SIGNALS THEORY PAPER WORK WORD RECOGNITION AND ITS METHODS Stud:Amir Salha Lecturer: Marius Gudauskis Group: MDU-5/2 KAUNAS 2017 Table of Contents I. Abstract: 3 II. Introduction to word recognition: 3 III. Visual word recognition: 3 i. Theoretical Approach: 3 Bayes theorem: 3 Connectionism: 4 Interactive activation (IA) model: 4 Lexical competition: 4 Lexical decision: 4 Masked priming: 4 Neighbourhood density: 5 Open bigrams: 5 Reaction time (RT) distribution: 5 Word-frequency effect: 5 ii. Practical Approach: 6 Step 1: Detect Candidate Text Regions Using MSER; 6 Step 2: Remove Non-Text Regions Based on Basic Geometric Properties; 6 Step 3: Remove Non-Text Regions Based on Stroke Width Variation; 7 Step 4: Merge Text Regions for Final Detection Result; 7 Step 5: Recognize Detected Text Using OCR; 8 Text Recognition Using the OCR Function: 8 Challenges Obtaining Accurate Results: 9 Image Pre-processing Techniques to Improve Results: 10 ROI-based Processing to Improve Results: 10 IV. Speech-recognition systems and its Example: 11 The Development Workflow: 12 Acquiring Speech: 12 Analyzing the Acquired Speech: 12 Developing a Speech-Detection Algorithm: 13 Developing the Acoustic Model: 13 Selecting a Classification Algorithm: 15 Building the User Interface: 15 V.
Conclusion: 16 VI. References: 16 I. Abstract: The abstract of this paper work is to recognizewhat is word recognition and the approach of methods used. Initializing thevisual and speech word recognition and its different approaches by providingexamples and real-life computation.II. Introductionto word recognition: Word recognition is a computational model that convertsor exchange visual (picture, video) content or speech (sound, genuine voice)into a real content document. This calculation model could be used in variousstrategies and methods, either by checking the content like in OCR instrumentsor by taking live pictures.
And in addition, it could be a voice or discourseacknowledgment of words like computational linguistics that creates strategiesand innovations that empowers the recognition and interpretation of talkeddialect into content by PCs.III. Visual wordrecognition: Is a computational recognitionreferring to the branch of software engineering that includes perusing contentfrom various plan of action and translating the pictures, recordings or livelayouts into a shape that the PC can control (for instance, into ASCII codes).
i. TheoreticalApproach: Bayestheorem: a numerical technique forupdating probabilities or convictions obtaining new evidence and confirmation.In the case of word recogniton, the probability of a word given theinformation, or confirmation, is as follows: Figure 1:Mathematical equation used Connectionism: models expressed as artificial neural networks. These models areproposed to catch general properties of neurons, or neuronal populaces. Interactiveactivation (IA) model: Words are represented as nodes or hubs in a system that are associatedby inhibitory connections. Figure 2: Thetop panel illustrates a simplified interactive activation model.
Lexicalcompetition: Neighbouring words contend with each other for the sake of recognition,this is because of the inhibitory connections between word hubs. Lexicaldecision: Members are required to choose whether a series of letters is a word ornot (a nonword). Maskedpriming: a variation on thelexical decisions tasks in which the objective is gone before by a quickdisplayed prime, which can be a word or a nonword. The prime is generallydisplayed in lower case and the objective in capitalized to limit physical overlap. Masked priming is mostregularly used to address questions concerning the representation oforthography. Neighbourhooddensity: A measure of how similar a word is to different words. A typical measureis what number of different words can be shaped by changing a single letter ina word? This means that just expressions of a similar length can be neighbours.
A more flexible measures similarity as far as the quantity of ‘edits’ –additions, deletion, and substitutions – so WORD and WORDS will now consider tobe neighbours. Openbigrams: A recommendation that the order of letters in a word is coded regardingan arrangement of requested letter sets, which might be non-bordering. WORD maybe coded as WO, WR WD, OR, OD, or RD. Figure 3: Threedifferent representations of letter order. Reactiontime (RT) distribution: Factors like word frequencyshifts the mean of distribution, yet more often than not the type of the distribution,as well.
Word-frequencyeffect: The most significant effect on how a word can be distinguished is its frequencyof occurance in the language. Words that appear all the time in the dialect are perceiveddistinguished more rapidly than low-frequency words. The speed and straightforwardness with which words can be recognised isan aproximate logarithmic function of word frequency. ii. PracticalApproach: a) Automatically detect and recognize text in natural images: This technique used here is to demonstrate and distinguish distcrete textin an image that contain content.
This method contains known situations wherethe position of content/text is known previously. The automated text detection calculation distinguishes a large region ofcontent/text elements and logically takes out the unidentified elements containingtext. Steps followed: Step1: Detect Candidate Text Regions Using MSER; Figure 4:MSER regions Step2: Remove Non-Text Regions Based on Basic Geometric Properties; Geometric properties that detects text andnon-text regions:· Aspect ratio· Eccentricity· Euler number· Extent· Solidity Figure 5: after removing non-text regionsbased on basic geometric properties Step3: Remove Non-Text Regions Based on Stroke Width Variation; Figure 6:after removing non-text regions basedon stroke width variation Step4: Merge Text Regions for Final Detection Result; Figure 7:expanded boundboxes textStep5: Recognize Detected Text Using OCR; Figure 8:detectedtext b) Recognition of text using optical character recognition (OCR): ü Optical CharacterRecognition refers to the branch of software/computer engineering. Is astrategy that includes perusing content from different clear text plates and makingan interpretation of the pictures into a frame that the PC can manipulate. TextRecognition Using the OCR Function: Text RecognitionUsing the OCR is of a various application such as image search,document analysis, and robot navigation. o The OCR functions as following:returns the recognized text, the recognition confidence, and the location ofthe text in the original image. You can use this information to identify thelocation of misclassified text within the image. o The OCR works as following: 1.
restores the identified text, 2. restoresthe recognized confidence, 3. restoresthe area of the content in the original image. Note: This datarecognizes the location of misclassified message inside the image, using the confidencevalues where the error can be identifiedbefore any further processing error takes place. ChallengesObtaining Accurate Results: Obtainingaccurate results in OCR performance is dependent on accuracy, stability anduniformity of text, by other means if the text used was static and stablehaving a word format the performance will be way accurate and better thanhaving a non-uniform, or unclear text where additional initial processing stepsshould be taken into account. Figure 9:These images show how OCR initial processing steps changed the image to a moresteady, uniform image allowing the Character to have a clear recognition oftext.
o ‘TextLayout’ parameters help improve the results, if the text indicates that no text isrecognized in the image, due to irregularities in the background.Causing the “OCR” a failurein find text margins and elements in the text, which leads to recognition failure. Note: If the OCR keeps on failing after layingout the text, checking the initial pre-processing steps is required to detectthe cause of the failure and this could be done using initial binarizationsteps that improves text segmentation. ImagePre-processing Techniques to Improve Results: Step 1: pre-processing using morphological reconstruction; This processworks by cleaning the image by taking out all the unclarity already found inthe images or residues. Figure 10:removingartifacts and producing a cleaner image for OCR. Step 2: “Locate Text” method; By locating thetext in the original image helps recognizes the characters needed especially ifthey are of the same parameters, and ignoring all unnecessary text. This methodis used when there is still initial unclarity defined by noise in the image.
Figure 11:ignoring irrelevant text using local text method ROI-basedProcessing to Improve Results: Identificationof specific regions in the image that OCR should process by selecting the regionsneeded manually, or by automating the process. Automatd textdetection automatically detect and recognizes text in natural images by using visiondefined by “BlobAnalysis”. Figure 12:manyconnected regions within the keypad image Small regionsare not likely to contain any text can be removed using vision “Blob Analysis”, where regionshaving an area smaller than the assumed area is removed. IV. Speech-recognition systems andits Example: Speech-recognition is arecognition method in detecting words from speakers speech with high accuracyand precision using filtering systems and adaptation methods to remove alltypes of noises in order to obtain a clear voice leading to better recognitionof words. Typesof Speech Recognition Systems: · Isolated, requires a brief pause between spokenwords.
· Continuous, pauses are not necessary. Note:Designing aspeech-recognition algorithm is a complex task requiring detailed knowledge ofsignal processing and statistical modelling. TheDevelopment Workflow: Building aspeech recognition system is of two levels: 1. The first step is the teaching level known astraining mode where the system should be taught words to have a reference datain detecting words said in the speech. 2. The second step is the testing stage: after thesystem is of a sufficient data of words and a trusted reference dictionary thesystem shall be tested in order to check how would the system would react tothe real life interpretation and to check what problems will appear to besolved. The development workflow consists of threesteps:o Speech acquisitiono Speech analysiso User interface development Acquiring Speech: Training stage: During thetraining stage we use the microphone to input the spoken words should thatshould be in a repetition of each digit in the dictionary to restore it in thedata base and the system is tested in an offline analysis. Testing stage: speech is continuouslystreamed into the environment for online processing, where continuous bufferspeech samples are acquired, plus processing the incoming speech frame by frame.
We use Data Acquisition Toolbox™to set up continuous acquisition of the speech signal and simultaneouslyextract frames of data for processing. Analysing the Acquired Speech: 1. Developing a word-detection algorithm thatseparates each word from noise. 2.
Derive a model that provides a representationof each word at the training stage. 3. Select an appropriate classification algorithmfor the testing stage. Developing a Speech-Detection Algorithm: Ø Algorithm is developed by usingthe initial recorded speech frame using a loop system, this algorithm detectsisolated digits on a specific period of time dependent on the frame by using zero-crossingcounts and signal energy for different speech frames. Note: Signal energy works well for detecting voicedsignals. zero-crossing counts work well for detectingunvoiced signals. Developingthe Acoustic Model: Is dependent on speechcharacteristics causing the system to obtain different words form the data basebuilt known as the dictionary. Ø Investigatingthe frequency characteristics of the human vocal tract by examining the powerspectral density (PSD) estimates of various spoken digits.
Figure 1b. Yule Walker PSD estimate of three different utterances of the word “TWO.” Click on image to see enlarged view. Figure 14a. Yule Walker PSD estimate of three different utterances of the word “ONE.” Click on image to see enlarged view. Ø Measuring theenergy of overlapping frequency bins of a spectrum within a frequency scaleusing Mel Frequency Cepstral Coefficients.
By combining all the featurevectors, and the estimation of multidimensional probability density function(PDF) of the vectors for a specific digit. Repeating this process for eachdigit, we obtain the acoustic model for each digit. Ø (PDF) of the vectors for a specific digit.Repeating this process for each digit, we obtain the acoustic model for eachdigit. Ø In the testingstage, we extract the MFCC vectors from the test speech and use a probabilisticmeasure to determine the source digit with maximum likelihood.
Figure 15: Distribution of the first dimension ofMFCC feature vectors for the digit one. Ø Providing a goodfit of standard distributions so it won’t look arbitrary. Figure 16: Overlay of estimatedGaussian components (red) and overall Gaussian mixture model (green) to thedistribution. Definition: GMM Gaussian mixturedensity is parameterized by the mixture weights, mean vectors, and covariancematrices from all component densities. Ø using an iterative expectation-maximization algorithm to obtaina maximum likelihood estimate to estimate the parameters of a GMM for a set ofMFCC feature vectors.
Ø Use of the Statistics and Machine Learning Toolbox distribution functionto estimate the GMM parameters. Selectinga Classification Algorithm: 1. Use of dictionary during testing stage toestimate GMM for each digit.
2. Test speech to extract again the MFCC featurevectors from each frame of the detected word. 3. Find the digit model with the maximum aposteriori probability for the set of test feature vectors. 4.
The log-likelihood value is computed usingthe posterior function inStatistics and Machine Learning Toolbox after knowing the digit model andsome test feature vectors. Buildingthe User Interface: Ø Create aninterface that displays the time domain plot of each detected word as well asthe classified digit, after developing the digit recognition system in anoffline environment with pre-recorded speech. Figure 17: Interface to final application. Click on image to see enlarged vieV.
Conclusion: ü Visual word recognition is a technologythat enables you to convert different types of documents, such as scanned paperdocuments, PDF files, images captured by a digital camera into editable andsearchable data or real live images using different methods and software’s(e.g. OCR, MICR and others…) were each program as a different computationalformat capable of diagnosing and turning scanned words into a real documentedone. ü Recognitionof Visual word text process acquires a lot of stages especially if thebackground of the text’s template is non-uniform, static, or clear which willlead for having pre-processing steps taken to remove all the unclarity andspecifying the exact margins and regions of the text. ü Speech-recognition softwareprogrammes work by analysing voice speech and converting them to words.
building a speech-recognition system goes into two stages training the systemby providing the database needed (dictionary) in order to recognize the frameand digit or words and after that testing the system in live processing time tocheck whether the system is functioning appropriately and finally building andinterface that functions according to time with respect to the frame of words ü Major challenges with Speech Recognitiontechnology faces an effective voice user interface requires a strong error resistantand the capacity to effectively exhibit the abilities of the design needed. VI. References: 1 Matlab support main website www.mathworks.
com/company/newsletters/articles/ 2 Chen, Huizhong, et al. “Robust Text Detection inNatural Images with Edge-Enhanced Maximally Stable Extremal Regions.”Image Processing (ICIP), 2011 18th IEEE International Conference on.
IEEE,2011.3 Gonzalez, Alvaro, et al. “Text location in compleximages.” Pattern Recognition (ICPR), 2012 21st International Conferenceon. IEEE, 2012.
4 Li, Yao, and Huchuan Lu. “Scene text detection viastroke width.” Pattern Recognition (ICPR), 2012 21st InternationalConference on.
IEEE, 2012.5 Neumann, Lukas, and Jiri Matas. “Real-time scene textlocalization and recognition.” Computer Vision and Pattern Recognition(CVPR), 2012 IEEE Conference on.
IEEE, 2012.6 Ray Smith. Hybrid Page Layout Analysis viaTab-Stop Detection. Proceedings of the 10th international conference ondocument analysis and recognition. 2009. 7 Morton J. The interaction of information in wordrecognition.
Psychol. Rev. 1969;76:165–178.
8 Davis C.J. The spatial coding model of visual wordidentification. Psychol. Rev. 2010;117:713–758.PubMed