PHP Spell Checker

August 18th, 2009 by Dragos Leave a reply »

This package can be used to check the spelling of text and get fix suggestions.

There is a base class that defines functions for spell checking.

There are also two classes for checking the spelling of text using either PHP scripts that contain arrays of valid words and grammar definitions, or using the hunspell program.

If the classes determine that there are misspelled words, they may return suggestions for eventual fixes.

  1. Installation
    This class uses affix dictionaries. This means that you can use the dictionaries from OpenOffice or Mozilla products.
    Just go to https://addons.mozilla.org/en-US/firefox/browse/type:3 and download the dictionaries that you want.
    Unpack the *.xpi files and extract the *.aff and *.dic files into dictionaries/hunspell/ folder.
    This class also works with MySpell dictionaries.
    There is nothing more you have to do, just start using the class. Please read further and then take a look at the examples.
  2. HunSpell version
    The hunspell version of this class is preferred.
    It uses less memory for complex languages or with a very big dictionary (over 150.000 words) and the word suggestion is excellent.
    You can get the binaries from http://sourceforge.net/projects/hunspell/files/Hunspell/1.2.8/
    When using this version of the class you need to specify where hunspell is installed by calling the setHunspellPath() method
    or add hunspell to the system path.
  3. PHP version
    In case you can not install hunspell on your server you can use a pure PHP version.
    The words suggestions is not as good as the HunSpell version (will implement a new algorithm soon).
    Also the PHP version can use a lot of memory for complex languages or with a very big dictionary (over 150.000 words).
    As an example, for German languages the script can use over 100Mb. On the other hand for simple languages it only uses about 10Mb.
    The affix dictionaries can not be used as it is. They need to be converted to PHP code.
    This can be done manuall by calling the compileHunAffixDictionary() method.
    If the dictionary is not compiled yet, the first time you use it it will try to compile it from hunspell.
    For complex languages this file can be over 15Mb.
  4. Known bugs, accuracy and performance concerns
    There are several known bugs:

    • initially if a word has the suffix doubled (misspell) the spell checker reports it as a correct word. (pure PHP version only).There is a fix implmented that will rezolve the issue but if a word is correctly spelled with the doubled sufix it will be reported ar wrong.You can remove the fix by commenting the 93-95 lines in PHPSpellChecker.class.php
    • in some languages there is a need for a special encoding and this is not yet supported (will be soon)

    Accuracy: (pure PHP version)

    • Word suggestion is not very smart. It’s just a simple “algorithm”. I plan on improveing this (maybe you can help out)

    Performance:

    • For very complex languages, or with a very big dictionary (over 150.000 words) (pure PHP version only) the script can use alot of memmory (100Mb)
  5. Conclusion
    The PHP version can be used very efficient for simple languages or with small dictionary (ex. english)
    This is just the first release and it will be greatlly improved.
    If you want to participate on this project please contact me.

Example (PHP Based):

<?php

// include the class
require_once(dirname(__FILE__)."/../PHPSpellChecker.class.php");

// instantiate the class
$spellCheck = new PHPSpellChecker();

/////////////////////////
// set some text to check
$text1 = "Die Commerzbank blickt besorgt in die Zukunft: Das Geldhaus rechnet in der zweiten Jahreshälfte mit einer Zunahme von Kreditausfällen - denn Firmen wie Privatkunden bekommen Probleme, ihre Schulden zu bedienen. Schon jetzt hat das Institut vorsichtshalber knapp eine Milliarde Euro zurückgelegt";
$result = $spellCheck->checkSpelling($text1, "de-DE"); // should return an empty array (text is correct)
//print_r($spellCheck->getWarnings());// get all warnings
//print_r($spellCheck->getErrors());// get all errors
if (count($result) == 0) {
    print "Text is OK !
";
} else {
    print "Text has errors !
";
    print "<pre>";
    print_r($result);
}
$spellCheck->clearWarnings(); // clear all previous warnings
$spellCheck->clearErrors(); // clear all previous errors

$text1 = "PHP: the quik browm fox jumps over the lazi dog"; // this text has 3 errors
$result = $spellCheck->checkSpelling($text1, "en-US"); // will return an array with the wrong words with associated suggestions
//$result = $spellCheck->checkSpelling($textWithErrors, "en-US", false); // will return an array with the wrong words without associated suggestions
//print_r($spellCheck->getWarnings());// get all warnings
//print_r($spellCheck->getErrors());// get all errors
if (count($result) == 0) {
    print "Text is OK !
";
} else {
    print "Text has errors !
";
    print "<pre>";
    print_r($result);
}

?>

Example (HunSpell Based):

<?php
// include the class
require_once(dirname(__FILE__)."/../HunSpellChecker.class.php");

// instantiate the class
$spellCheck = new HunSpellChecker();
$spellCheck->setHunspellPath(dirname(__FILE__)."/../hunspell"); // set path for windows systems
/////////////////////////
// set some text to check
$text1 = "Die Commerzbank blickt besorgt in die Zukunft: Das Geldhaus rechnet in der zweiten Jahreshälfte mit einer Zunahme von Kreditausfällen - denn Firmen wie Privatkunden bekommen Probleme, ihre Schulden zu bedienen. Schon jetzt hat das Institut vorsichtshalber knapp eine Milliarde Euro zurückgelegt";
$result = $spellCheck->checkSpelling($text1, "de-DE"); // should return an empty array (text is correct)
//print_r($spellCheck->getWarnings());// get all warnings
//print_r($spellCheck->getErrors());// get all errors
if (count($result) == 0) {
    print "Text is OK !
";
} else {
    print "Text has errors !
";
    print "<pre>";
    print_r($result);
}
$spellCheck->clearWarnings(); // clear all previous warnings
$spellCheck->clearErrors(); // clear all previous errors

$text1 = "PHP: the quik browm fox jumps over the lazi dog"; // this text has 3 errors
$result = $spellCheck->checkSpelling($text1, "en-US"); // will return an array with the wrong words with associated suggestions
//$result = $spellCheck->checkSpelling($textWithErrors, "en-US", false); // will return an array with the wrong words without associated suggestions
//print_r($spellCheck->getWarnings());// get all warnings
//print_r($spellCheck->getErrors());// get all errors
if (count($result) == 0) {
    print "Text is OK !
";
} else {
    print "Text has errors !
";
    print "<pre>";
    print_r($result);
}
?>

In the package you will also get an interactive example page so you can easily test it out.

Download class and hope it’s useful to you.

VN:F [1.9.22_1171]
Rating: 9.7/10 (12 votes cast)
PHP Spell Checker, 9.7 out of 10 based on 12 ratings
Advertisement

2 comments

  1. Adrian says:

    Very good job, this is a great piece of code and it saved me hours of work! Thank you!

    VA:F [1.9.22_1171]
    Rating: 4.0/5 (1 vote cast)
    VA:F [1.9.22_1171]
    Rating: +1 (from 1 vote)
  2. Frédéric Glorieux says:

    Hi,

    I’m trying to have a look to your code, but PHPClasses seems to not work (I’m not able to stay register) « Application problem
    Sorry, for the time being
    Icontem Accounts is not available. »

    Is your code available somewhere else ?

    VA:F [1.9.22_1171]
    Rating: 3.0/5 (1 vote cast)
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)