My initial response was:
Because it is straightforward to not engage in writing systems and applications that can be used to aid prejudice and foster division but its very hard to avoid modelling and designing into data stores categorisations that can be used in ways which are prejudicial to the owners of that personal data.
But what about needing to know who is affected by this or that prejudice and persecution so we can protect them and improve their lot? Surely we need to collect information to identify those parts of the population that need help? But do you need to count in order to know what is the right way to treat everyone? Is counting and identifying itself the wrong?
For specific needs does someone really need to identify themselves as disabled, or are they really an individual with a requirement?
Personally, I don't categorise myself in any ethnic, religious category on any form and would avoid gender and age if I could. As a modeller, architect, would I argue against collecting this data? I would now.
I would employ all the arguments about not collecting personnel data unnecessarily. Does your application/system require gender to be relevant? Really? Age? Should the provision of public services need ethnic data? And so on.
Really ask if each one of these metadata categories is necessary, bear in mind that each of the categories will likely be from a control list plus 'other' perhaps. What purpose will be served, if its for some broad population statistical use ask how does this category give meaning ful information that actually matters in a statistic. Take Male/Female (I won't fall back on self described gender issues to begin with the traditional simple case should suffice), how does it help knowing someone ticked either box? Will they buy or be interested in a different product or service? Will they want different information, will the content be filtered?
If the answer is yes I'd ask the question, so you wouldn't sell to someone of the wrong gender? Would you only show pink bikes to girls? Carbon fibre drop handlebars to boys? I'd hope not, so how does your case differ?
Are the services, products and information that I'm interested in in any way absolutely connected with my age, gender, ethnicity, mobility? I don't think so. The actual services, products and information might very well have particular characteristics that I want to filter and search by but not necessarily because I possess or share those characteristics.
If you're building recommenders why limit or filter or weight your recommendations based upon any of these largely loose non-authoritative categories? Isn't the behaviour and content far more important. Quite a while ago now, over ten years, we were involved in building a streaming personal video platform and the Advertising people wanted us to include something like 200+ questions on the user's likes and dislikes. That got rejected fairly quickly just in contemplating the registration funnel but it did betray the then assumptions about how to collect and slice and dice population data. Analyse the individual, get as much stuff about the individual from the individual and apply that to the content, product or service. It came straight out of the publishing industry with the cards for subscribers to punch or circle their characteristics.
Now of course we apply it the other way round, the behaviour of the individual and their peers or group, their history of success and failure or abandonment, the content/product/service chosen by that group along with many other dimensions of behaviour to new and evolving content, products and services, all that Big Data stuff. And we don't really need all that categorisation up front, in fact it could skew the results badly.
But for those organisations whose data sets were modelled and collected well before this Big Data magic and they had all this carefully researched (or not) data, has it been reevaluated? Or is it sitting there in the rest of the data but consciously or unconsciously splitting your data sets into what might be irrelevant and even misleading subsets.
I think a great many of these characteristics should not be collected and stored and they should all be reevaluated periodically. I include official forms in this, actually I especially include official forms in this.
I'd be interested in any comments or counter arguments people have.
CC BY-SA 4.0