Clone this repo:
  1. 2a8cae4 HISTORY: Entries for v1.7.0 by James D. Forrester · 9 days ago master 1.7.0
  2. 55bad21 Add a HISTORY.md file tracking previous commits by James D. Forrester · 9 days ago
  3. 28721c7 Add format controls from Mongolian range as invisible by Umherirrender · 3 weeks ago
  4. a6bfb4d Add U+1680 ogham space mark and U+3000 ideographic space by Umherirrender · 3 weeks ago
  5. 9592947 Add U+061C ARABIC LETTER MARK as invisible by Umherirrender · 3 weeks ago

Equivset

A mapping of "equivalent" or similar-looking characters (homoglyphs) to prevent spoofing. This is similar to the Unicode Consortium's confusables.txt with some significant differences. Confusables.txt lists character pairs that are visually identical or nearly identical, for example, Latin "A" and Greek "Α" (alpha). This list is much broader, including pairs that merely look similar, for example, "S" and "$". Another difference is that this list only includes letters and punctuation. It does not include symbols, emoji, or graphical elements.

Installation

Using composer: Add the following to the composer.json file for your project:

{
  "require": {
     "wikimedia/equivset": "^1.0.0"
  }
}

And then run composer update.

Usage

use Wikimedia\Equivset\Equivset;

$equivset = new Equivset();

// Normalize a string
echo $equivset->normalize( 'sp00f' ); // SPOOF

// Get a single character.
if ( $equivset->has( 'ɑ' ) ) {
	$char = $equivset->get( 'ɑ' );
}
echo $char; // A

// Loop over entire set.
foreach ( $equivset as $char => $equiv ) {
	// Do something.
}

// Get the entire set.
$all = $equivset->all();

Contributing

All changes should be made to ./data/equivset.in. Then run bin/console generate-equivset to generate the JSON, PHP, and plain text versions of the equivset in ./dist.

When releasing, update HISTORY.md with git log --format='* %s (%aN)' --topo-order to consistently list commits.