Friday, July 29, 2011

Falsehoods Programmers Believe About Names | Kalzumeus Software

Falsehoods Programmers Believe About Names

Posted on June 17, 2010 by in Uncategorized

John Graham-Cumming wrote an article today complaining about how a computer system he was working with described his last name as having invalid characters.  It of course does not, because anything someone tells you is their name is — by definition — an appropriate identifier for them.  John was understandably vexed about this situation, and he has every right to be, because names are central to our identities, virtually by definition.

I have lived in Japan for several years, programming in a professional capacity, and I have broken many systems by the simple expedient of being introduced into them.  (Most people call me Patrick McKenzie, but I’ll acknowledge as correct any of six different “full” names, any many systems I deal with will accept precisely none of them.) Similarly, I’ve worked with Big Freaking Enterprises which, by dint of doing business globally, have theoretically designed their systems to allow all names to work in them.  I have never seen a computer system which handles names properly and doubt one exists, anywhere.

So, as a public service, I’m going to list assumptions your systems probably make about names.  All of these assumptions are wrong.  Try to make less of them next time you write a system which touches names.

  1. People have exactly one canonical full name.
  2. People have exactly one full name which they go by.
  3. People have, at this point in time, exactly one canonical full name.
  4. People have, at this point in time, one full name which they go by.
  5. People have exactly N names, for any value of N.
  6. People’s names fit within a certain defined amount of space.
  7. People’s names do not change.
  8. People’s names change, but only at a certain enumerated set of events.
  9. People’s names are written in ASCII.
  10. People’s names are written in any single character set.
  11. People’s names are all mapped in Unicode code points.
  12. People’s names are case sensitive.
  13. People’s names are case insensitive.
  14. People’s names sometimes have prefixes or suffixes, but you can safely ignore those.
  15. People’s names do not contain numbers.
  16. People’s names are not written in ALL CAPS.
  17. People’s names are not written in all lower case letters.
  18. People’s names have an order to them.  Picking any ordering scheme will automatically result in consistent ordering among all systems, as long as both use the same ordering scheme for the same name.
  19. People’s first names and last names are, by necessity, different.
  20. People have last names, family names, or anything else which is shared by folks recognized as their relatives.
  21. People’s names are globally unique.
  22. People’s names are almost globally unique.
  23. Alright alright but surely people’s names are diverse enough such that no million people share the same name.
  24. My system will never have to deal with names from China.
  25. Or Japan.
  26. Or Korea.
  27. Or Ireland, the United Kingdom, the United States, Spain, Mexico, Brazil, Peru, Russia, Sweden, Botswana, South Africa, Trinidad, Haiti, France, or the Klingon Empire, all of which have “weird” naming schemes in common use.
  28. That Klingon Empire thing was a joke, right?
  29. Confound your cultural relativism!  People in my society, at least, agree on one commonly accepted standard for names.
  30. There exists an algorithm which transforms names and can be reversed losslessly.  (Yes, yes, you can do it if your algorithm returns the input.  You get a gold star.)
  31. I can safely assume that this dictionary of bad words contains no people’s names in it.
  32. People’s names are assigned at birth.
  33. OK, maybe not at birth, but at least pretty close to birth.
  34. Alright, alright, within a year or so of birth.
  35. Five years?
  36. You’re kidding me, right?
  37. Two different systems containing data about the same person will use the same name for that person.
  38. Two different data entry operators, given a person’s name, will by necessity enter bitwise equivalent strings on any single system, if the system is well-designed.
  39. People whose names break my system are weird outliers.  They should have had solid, acceptable names, like 田中太郎.
  40. People have names.

This list is by no means exhaustive.  If you need examples of real names which disprove any of the above commonly held misconceptions, I will happily introduce you to several.  Feel free to add other misconceptions in the comments, and refer people to this post the next time they suggest a genius idea like a database table with a first_name and last_name column.

This blog is about the business aspects of running Bingo Card Creator, a small software company. Want more great articles? I keep a list of my best work curated. A brief summary of the last few years is available here. If you like what you see, I encourage you to sign up for the RSS feed. Thanks for visiting!

If you deal with code or systems that keep databases - this is worth the read. I do health care IT and I think probably only the government and the credit card industry would have more contact with this problem.

I've written many, many algorithms to deal with names and I agree with every item on this list.

For anyone still reading this that doesn't deal with this kind of system. Here's an example I had to deal with: Most systems in the US assume either one long string (50 characters, really?) or three separate ones. Okay. Then deal with this name... Little John Running Bear - the third. Which one is a last name? Are you sure? Which is the middle inital? How do you differentiate "him" from dad and grandfather. They all list their address as "general delivery" at the reservation post office. Oh and he also goes by "Little John", "Little Bear" and "R.B. Three". Good luck.

(BTW this is not a real name AFAIK but it's very close to one and an exact functional example from my past)

Posted via email from ninjahippie's (pre) posterous

Tuesday, July 26, 2011

Monday, July 25, 2011

Victory for evolution in Texas | NCSE

Pop the champagne corks. The Texas Board of Education has unanimously come down on the side of evolution. In 14-0* vote, the board today approved scientifically accurate high school biology textbook supplements from established mainstream publishers--and did not approve the creationist-backed supplements from International Databases, LLC.

"This is a huge victory for Texas students and teachers," said Josh Rosenau, NCSE programs and policy director, who testified at the hearings this week. In his testimony, Rosenau urged the board to approve the supplements--recommended by a review panel largely composed of scientists and science educators--without amendments, and to reject International Database's creationist submission. The board did just that, and asked for only minimal changes to the approved supplements.

In hearings yesterday, NCSE members and allies showed up in force. At least four times as many people testified in favor of the supplements as written, versus those opposing the supplements or demanding significant changes.

One hot button: the supplement from Holt McDougal. A creationist member of the review panel released a list of Holt's supposed errors involving evolution and common descent. But in today's hearing, the Texas Education Agency pointed out that the full membership of the review panel had not signed off on the list.

Ultimately, the board approved the Holt supplement, and directed Commissioner of Education Robert Scott to review the list of supposed errors, and to develop amended language for Holt to incorporate. NCSE and Texas education groups are confident Scott's revisions will reflect the current state of evolutionary biology, and not any creationist alternatives.

Dr. Eugenie Scott, NCSE's Executive Director is celebrating the decision. "These supplements reflect the overwhelming scientific consensus that evolution is the core of modern biology, and is a central and vital concept in any biology class. That these supplements were adopted unanimously reflects a long overdue change in the board. I commend the board for its refusal to politicize science education."

* Correction: This story initially reported the vote as 8-0. The board has 15 members, with one (Mary Helen Berlanga) away on vacation.

There is hope. Way to go Texas school board!

Posted via email from ninjahippie's (pre) posterous

Friday, July 22, 2011

Google Music Manager Finally Launches On Linux!


Woot! I understand that the Linux version will upload OGG files where the windows version did not. There goes my bandwidth for the weekend...

Posted via email from ninjahippie's (pre) posterous

Tuesday, July 19, 2011