// International Chemical Identifier KEY Regex, by lo sauer - lsauer.com // The InChIKey, or hashed InChI, is a fixed length (25 character) condensed digital representation // of the InChI, which tries to be unique but is not human-comprehensible. It uses a BASE26 alphabet (Hexavigesimal)! // The last character of an InChIKey is computed from the rest of the InChIKey // The InChIKey specification facilitates web searches for chemical compounds, owing to sufficiently unqiue referencing // of compounds with a concise key, which is problematic with the full-length InChI (e.g. GET url limit is 1600 chars) // From the official documents ( http://chemdata.nist.gov/InChI/inchi-hash.pdf ): // "The InChIKey is a character signature based on a hash code of the InChI string. Also, this hash // may serve as a checksum for verifying InChI, for example, after transmission over a network." // InChIKey has four (4) distinct components: a 14-character hash of the basic (Mobile-H) // InChI layer (without /p segment accounting for added or removed protons); a 8-character // hash of the remaining layers; a 1 character is a flag indicating selected features (e.g. // presence of fixed-H layer); a 1 character is a “check” character. The overall length of // InChIKey is fixed at 25 characters, including separator: // AAAAAAAAAAAAAA-BBBBBBBBCD // This is significantly shorter than a typical InChI string (for example, the average length // of InChI string for Pubchem collection is 146 characters). // -------------------------------- // InChIKey layout is as follows: // -------------------------------- // AAAAAAAAAAAAAA // First block (14 letters) // Encodes molecular skeleton (connectivity) // BBBBBBBB // Second block (8 letters) // Encodes proton positions (tautomers), stereochemistry, isotopomers, reconnected layer // C // Flag character // Indicates InChI version, presence of a fixed-H layer, isotopes, and stereochemical // information. // D // Check character, obtained from all symbols except delimiters, i.e. from // AAAAAAAAAAAAAABBBBBBBBC // All symbols except the delimiter (a dash, that is, a minus) are uppercase English letters // representing a “base-26” encoding. // see also:http://en.wikipedia.org/wiki/Hexavigesimal //InChiKey v1.2 length: 14-10-1 //InChIKey v1.2 for morphine is BQJCRHHNABKAKU-KBQPJGBKSA-N var x = 'BQJCRHHNABKAKU-KBQPJGBKSA-N' 25===x.length && '-'===x[14] && !!x.match(/^([0-9A-Z\-]+)$/) >false //enzyme ligand Copper - InchiKey: RYGMFSIKBFXOCR-UHFFFAOYSA-N var x = 'RYGMFSIKBFXOCR-UHFFFAOYSA-N' 27===x.length && '-'===x[14] && '-'===x[25] && !!x.match(/^([0-9A-Z\-]+)$/) >true //Collisions do occurs: //see http://www.chemconnector.com/2011/09/01/an-inchikey-collision-is-discovered-and-not-based-on-stereochemistry/