最近在研究"一致性HASH算法"(Consistent Hashing),用于解決memcached集群中當(dāng)服務(wù)器出現(xiàn)增減變動(dòng)時(shí)對(duì)散列值的影響。后來 在JAVAEYE上的一篇文章中,找到了其中的 KetamaHash 算法的JAVA實(shí)現(xiàn)(一種基于虛擬結(jié)點(diǎn)的HASH算法),于是為了加深理解,對(duì)照 JAVA版本,用C#重寫了一個(gè)。放到這里,如果大家感興趣的話, 可以下載測(cè)試一下,如果發(fā)現(xiàn)寫法有問題請(qǐng)及時(shí)告之我,以便我及時(shí)修正。 ![]() ![]() Ketama is an implementation of a consistent hashing algorithm, meaning you can add or remove servers from the memcached pool without causing a complete remap of all keys.
Here’s how it works: * Take your list of servers (eg: 1.2.3.4:11211, 5.6.7.8:11211, 9.8.7.6:11211) * Hash each server string to several (100-200) unsigned ints * Conceptually, these numbers are placed on a circle called the continuum. (imagine a clock face that goes from 0 to 2^32) * Each number links to the server it was hashed from, so servers appear at several points on the continuum, by each of the numbers they hashed to. * To map a key->server, hash your key to a single unsigned int, and find the next biggest number on the continuum. The server linked to that number is the correct server for that key. * If you hash your key to a value near 2^32 and there are no points on the continuum greater than your hash, return the first server in the continuum. If you then add or remove a server from the list, only a small proportion of keys end up mapping to different servers.
下面是添加結(jié)點(diǎn)時(shí): 如這篇文章所說(總結(jié)一致性哈希(Consistent Hashing) ): 了解了原理,下面看一下具體實(shí)現(xiàn)。 JAVA實(shí)現(xiàn)代碼取自Spy Memcached client中的類,原文的作者也盡可能的對(duì)代碼進(jìn)行注釋說明,所以讓我剩了不少時(shí)間。 ![]() ![]() public class KetamaNodeLocator
{ //原文中的JAVA類TreeMap實(shí)現(xiàn)了Comparator方法,這里我圖省事,直接用了net下的SortedList,其中Comparer接口方法) private SortedList<long, string> ketamaNodes = new SortedList<long, string>(); private HashAlgorithm hashAlg; private int numReps = 160; //此處參數(shù)與JAVA版中有區(qū)別,因?yàn)槭褂玫撵o態(tài)方法,所以不再傳遞HashAlgorithm alg參數(shù) public KetamaNodeLocator(List<string> nodes, int nodeCopies) { ketamaNodes = new SortedList<long, string>(); numReps = nodeCopies; //對(duì)所有節(jié)點(diǎn),生成nCopies個(gè)虛擬結(jié)點(diǎn) foreach (string node in nodes) { //每四個(gè)虛擬結(jié)點(diǎn)為一組 for (int i = 0; i < numReps / 4; i++) { //getKeyForNode方法為這組虛擬結(jié)點(diǎn)得到惟一名稱 byte[] digest = HashAlgorithm.computeMd5(node + i); /** Md5是一個(gè)16字節(jié)長(zhǎng)度的數(shù)組,將16字節(jié)的數(shù)組每四個(gè)字節(jié)一組,分別對(duì)應(yīng)一個(gè)虛擬結(jié)點(diǎn),這就是為什么上面把虛擬結(jié)點(diǎn)四個(gè)劃分一組的原因*/ for (int h = 0; h < 4; h++) { long m = HashAlgorithm.hash(digest, h); ketamaNodes[m] = node; } } } } public string GetPrimary(string k) { byte[] digest = HashAlgorithm.computeMd5(k); string rv = GetNodeForKey(HashAlgorithm.hash(digest, 0)); return rv; } string GetNodeForKey(long hash) { string rv; long key = hash; //如果找到這個(gè)節(jié)點(diǎn),直接取節(jié)點(diǎn),返回 if (!ketamaNodes.ContainsKey(key)) { //得到大于當(dāng)前key的那個(gè)子Map,然后從中取出第一個(gè)key,就是大于且離它最近的那個(gè)key 說明詳見: http://www./topic/684087 var tailMap = from coll in ketamaNodes where coll.Key > hash select new { coll.Key }; if (tailMap == null || tailMap.Count() == 0) key = ketamaNodes.FirstOrDefault().Key; else key = tailMap.FirstOrDefault().Key; } rv = ketamaNodes[key]; return rv; } }
![]() ![]() public class HashAlgorithm
{ public static long hash(byte[] digest, int nTime) { long rv = ((long)(digest[3 + nTime * 4] & 0xFF) << 24) | ((long)(digest[2 + nTime * 4] & 0xFF) << 16) | ((long)(digest[1 + nTime * 4] & 0xFF) << 8) | ((long)digest[0 + nTime * 4] & 0xFF); return rv & 0xffffffffL; /* Truncate to 32-bits */ } /** * Get the md5 of the given key. */ public static byte[] computeMd5(string k) { MD5 md5 = new MD5CryptoServiceProvider(); byte[] keyBytes = md5.ComputeHash(Encoding.UTF8.GetBytes(k)); md5.Clear(); //md5.update(keyBytes); //return md5.digest(); return keyBytes; } }
上面三行分別是正常情況,節(jié)點(diǎn)增加,節(jié)點(diǎn)刪除情況下的節(jié)點(diǎn)數(shù)目。下面兩行表示在節(jié)點(diǎn)增加和刪除情況下,同一個(gè)key分配在相同節(jié)點(diǎn)上的比例(命中率)。 后來我修改了幾次增刪結(jié)點(diǎn)的數(shù)量,基本驗(yàn)證了JAVA那位仁兄所說的那樣:
|
|