One Mistake Sequence

I'm working a lot with URLs that contain ids and very often, I made a mistake in one digit of the long id and end up with a completely different element. If I don't pay attention, then I end up looking at two elements thinking they are the same and am intrigued until I find out the mistake.

In order to avoid that, I wanted to know if I could make a sequence of ids where making one mistake would not give a valid id. For example, if you have the id 473, then you have to black list all the ids where the first digit is wrong (073, 173, 273, 373, 573, 673, 773, 873, 973) and the second being wrong (403, 413, 423, 433, 443, 453, 463, 483, 493) and the last one (470, 471, 472, 474, 475, 476, 477, 478, 479).

Here is the list of the first pseudo-numbers:

  1. 011
  2. 022
  3. 033
  4. 044
  5. 055
  6. 066
  7. 077
  8. 088
  9. 099
  10. 101
  11. 110
  12. 123
  13. 132
  14. 145
  15. 154
  16. 167
  17. 176
  18. 189
  19. 198
  20. 202
  21. 213
  22. 220
  23. 231
  24. 246
  25. 257
  26. 264
  27. 275
  28. 303
  29. 312
  30. 321
  31. 330
  32. 347
  33. 356
  34. 365
  35. 374
  36. 404
  37. 415
  38. 426
  39. 437
  40. 440
  41. 451
  42. 462
  43. 473
  44. 505
  45. 514
  46. 527
  47. 536
  48. 541
  49. 550
  50. 563
  51. 572
  52. 606
  53. 617
  54. 624
  55. 635
  56. 642
  57. 653
  58. 660
  59. 671
  60. 707
  61. 716
  62. 725
  63. 734
  64. 743
  65. 752
  66. 761
  67. 770
  68. 808
  69. 819
  70. 880
  71. 891
  72. 909
  73. 918
  74. 981
  75. 990

And here is a visual representation of those numbers:

For each digit in the number, we blacklist 9 other numbers. For a 5 digits number (eg 12345), that means blacklisting 45 numbers. For a 10 digits number (eg 1234567890), that means blacklisting 90 numbers. The number of blacklisted numbers only grows at a logarithmic scale.

In order to see how many numbers we lose, I plotted the ratio of pseudo numbers count compared to the real numbers. We can roughly keep one number every fifteen. But the good news is that the ratio doesn't fall off the chart as the numbers grow.

Looking at the numbers, they looked like to go from 10 to 10 but with some huge spikes and sometimes they were closer. So I plotted the difference between two consecutive numbers in a chart and it looks like the difference is centered around 10. But the variance is getting higher and higher as you move further.

I'm not really sure if this sequence can be really useful in practice but that was a fun week-end experiment. I hope it'll give you some, hopefully useful, ideas 🙂

If you liked this article, you might be interested in my Twitter feed as well.
 
  • http://paulisageek.com Paul Tarjan

    This is called a code with Hamming distance 1.

    http://en.wikipedia.org/wiki/Hamming_distance

  • David

    It's actually (minimum) Hamming distance 2. In general, an e-error-correcting, f-error-detecting code has minimum Hamming distance at least e+f+1. This is a 0-error-correcting, 1-error detecting code, so the minimum Hamming distance is at least 2.

    An easier construction is to use the first two digits as message digits and the final one as a parity digit: if the id is 57, use 572 in your URL (2 because 5 + 7 = 2 mod 10). That will give you 100 codewords ("pseudo-numbers") instead of the 76 in your construction, and it'll still be 0-error-correcting, 1-error-detecting.

  • http://www.facebook.com/vjeux Christopher Chedeau

    Thanks for the explanation. It makes me realize that the construction I am using is a greedy way to solve the problem. I start at 0, blacklist all the elements and find the next valid one in increasing order. If I picked another one, I could find more codewords in the same space.

    Now the question is, do you know how to find the maximum number of codewords we can make for a given number of digits and how to construct that sequence. Is it the modulo? If yes, how do you generalize it to more than 2 digits?

    Thanks!

  • David

    The same construction works for more digits. For example, you would encode 88273 to 882738, since 8+8+2+7+3=8 (mod 10). This parity check code produces the maximum possible number of codewords for a 1-error-detecting code. If you're skeptical, you can check the Wikipedia article 'Singleton bound' for mathematical proof.

  • http://www.facebook.com/vjeux Christopher Chedeau

    Thanks a lot 🙂

 

Related Posts

  • August 27, 2011 Start a technical blog, it’s worth it! (4)
    Lately, I've been advocating to all my student friends to start a blog. Here's an article with the most common questions answered :) What are the benefits? Being known as an expert. The majority of my blog posts are about advanced Javascript topics. As a result, I'm being tagged as […]
  • October 8, 2011 Copy SQL Row Changing ID (2)
    I've come across an SQL issue. I need to make a fake spell for the WoW database. However creating one from scratch is too annoying, there are like 30 non-NULL values to fill. Instead what I want is to copy an existing spell with the new id. It appeared to be less trivial than […]
  • September 17, 2010 Starcraft 2 Custom Map Popularity Listing (0)
    Popularity System In Starcraft 2, the Custom Maps are being listed by popularity. The popularity is the number of times the map has been played for more than 5 minutes during the last 12 hours. As I am running the website sc2mapster.com, I would like to have this listing. This […]
  • September 22, 2011 URLON: URL Object Notation (43)
    #json, #urlon, #rison { width: 100%; font-size: 12px; padding: 5px; height: 18px; color: #560061; } I am in the process of rewriting MMO-Champion Tables and I want a generic way to manage the hash part of the URL (#table__search_results_item=4%3A-slot). I no longer […]
  • September 11, 2011 World of Warcraft HTML Tooltip Diff (0)
    MMO-Champion is a World of Warcraft news website. When a new patch is released, we want to show what has changed in the game (Post Example). An english summary of each spell change is hand written, but we want to show the exact tooltip changes. jsHTMLDiff is available on […]