tagging

Bob Boiko on the metatorial process

Another great speaker at the WritersUA conference was Bob Boiko. If you have not read his great Content Management Bible, google it and read it online. Yes, all 800 some pages. He loves indexing and tagging data, so knowing him and quoting him when you are talking to clients can establish a common vocabulary.

His talk this year was about the importance of the metatorial process. It is like the editorial process, but for your content's metadata. His strategy when approaching large bodies of data that need a schema is to ask these questions:
What underlying structures can be behind the surface structures we need?
How will we tag items so that they are part of the structures?
How much time and resource do we expect to get the backlog tagged and to tag on an ongoing basis?
How will we review, evaluate, and renew our approach?

He says that professionally tagging content is more important than social network tagging, because the content has to participate in another structure outside of the social one, and you must have consistency at the base of the information.

"Information strategy tells you what you had better be doing. Information structure tells you how you had better be doing it."

He feels that content writers and structurers (and us, as part of that) need to take control back from the IT people and drive the metatorial process.

Boiko is always worth reading and hearing if he comes near your town.

He has a course up at
winhost.ischool.washington.edu/courseBook - go take a look!

Mal Booth at ANZSI conference

Mal Booth gave the opening sessions at the ANZSI conference in Sydney yesterday. Great speaker, and there is no way his slide set captures his speech, but it is available at http://www.slideshare.net/malbooth/miscellaneous-connections

One page tagging manifesto

tag
See the whole picture here.

Merlin Mann on getting things done

Merlin Mann runs the 43 things blog, and recently spoke at the Maximum Fun conference on being creative and getting things done. It was, to say the least, an interesting speech -- a little raunchy, a little geeky - but in the middle section he describes all the things we do to stop ourselves. As I listened to that portion, I remembered me first starting out indexing, and freezing, unable to write an entry for my first professional job. "I can't do this!" Merlin talks about that moment and how to get past it, as well as how a new iPhone isn't going to help you get your novel written. As an added plus, he talks about tagging and the taxonomy that the Seduction Society is using. Who would connect those two worlds other than Mann?

Worth listening to here.

Taxonomies and tags are political, or #Amazonfail

A week or so ago, some authors noticed that their GLBT-related books on Amazon had lost the sales ranking figures that Amazon uses to rate books as "most popular" or "most copies sold." These rankings, whether your book is #1 or number 678,900, are rather important -- they can determine whether your book is shown or not when someone types in a search for a subject, and then chooses to rank the results by sales ranking to see which are the best selling books for a topic.

Complaints, firestorms on Twitter, blog posts, and general mayhem ensued. Many people thought it was an act of out and out bias, especially since gay marriage in the states has been a big newsmaker in this last month. Authors asked Amazon what was going on, and received a tepid response. Evidently the powers that be at Amazon had decided to no longer display books tagged with the "adult" tag in their rankings anymore. And somehow, this "adult" tag included books like Heather has Two Mommies and Lady Chatterley's Lover, not just adult books. It meant that if you searched on "homosexuality", your searches would only reveal anti-homosexuality books and items. Several blogs took screen captures and posted them. Many people went into Amazon's book listings and started tagging books with #amazonfail. User tagging as a protest tool! Twitter posts quickly spread with the #amazonfail tag as well.

After reading a ton of articles and postings about this mess, I think I agree with Patrick at Making Light:

I’d bet lunch that the sequence of events, in its simplest form, went something like this:

(1) Sometime in the middle-distance past—maybe a couple of months ago, maybe a year, it doesn’t matter—somebody decided that it would be a good idea to make sure that works of straight-out pornography (or, for that matter, sex toys) didn’t inadvertently show up as the top result for innocuous search queries. (The many ways that this could happen are left as an exercise for Making Light’s commentariat.) A policy was promulgated that “adult” items would be removed from the sales rankings and thus rendered invisible to general search.

(2) Sometime more recently, an entirely different group of people were given the task of deciding what things for sale on Amazon should be tagged “adult,” but in the journey from one department to another, and from one level of the hierarchy to another, the directive mutated from “let’s discreetly unrank the really raunchy stuff” to “we’d better be careful to put an ‘adult’ tag on anything that could imaginably offend anyone.” Indeed, as Teresa pointed out, it’s entirely possible that someone used a canned list of “adult” titles supplied from outside, something analogous to the lists of URLs sold by “net nanny” outfits, which would account for the newly-unranked status of works like Lady Chatterley’s Lover. (As one net commenter observed, “What is this, 1928?”)


I have found when doing taxonomy that it is an activity with almost no neutral ground. Every decision has its opponents, and you have to build consensus for a particular worldview when you are working with groups who see the world differently, and that's nearly every group of more than two people. I was working in a relatively calm area like PC hardware or software tasks, where you would think a printer and a monitor are not the same category of item, and yet I heard arguments that were valid showing me why they were the same! "It depends," as we always say about indexing.

Things are starting to get fixed. Some recent searches under homosexuality on Amazon were starting to show more normal results, so I think the #amazonfail tagging effort has had some effect and Amazon is doing something about this, after their feeble first response. The Seattle PI has a response from Amazon's Drew Herdener:

This is an embarrassing and ham-fisted cataloging error for a company that prides itself on offering complete selection.

It has been misreported that the issue was limited to Gay & Lesbian themed titles – in fact, it impacted 57,310 books in a number of broad categories such as Health, Mind & Body, Reproductive & Sexual Medicine, and Erotica. This problem impacted books not just in the United States but globally. It affected not just sales rank but also had the effect of removing the books from Amazon's main product search.

Many books have now been fixed and we're in the process of fixing the remainder as quickly as possible, and we intend to implement new measures to make this kind of accident less likely to occur in the future.


Amazon does need to look at its taxonomy structures and labeling, and see where they might be failing. You cannot let machine algorithms replace human sensibility. I think Amazon is importing tags from publishers, and probably importing taxonomies. At a session years ago I heard from an employee that they let all of their fact-checking people go, and rely on users and publishers to supply correct and corrected data on all of their bibliographic information. It saved them 400 jobs. Libraries I knew had stopped their subscriptions to Books In Print, thinking Amazon would be easier and faster and just as good, not realizing that it is full of errors until corrected. We have all seen examples of wrong covers for books, or indexes for the first edition showing up in the second edition's listings. I would bet they are relying on publishers for taxonomic structures as well, but I don't know for sure. Probably piecemeal, using them in places, finetuning them in others.

As Laura Dawson says:

I've done so much taxonomy work, both for Muze and BN.com - and my colleagues and I have all agonized over the political decisions we've had to make because in a taxonomy you have to articulate concepts and arrange them. Like staying-awake-at-night agonizing, because these articulations and arrangements either bring books to light or tuck them away where few can find them, depending. (Richard Nash also makes a great point up this same alley.)

And it's worth getting upset about. What happened at Amazon is the result of dozens of small decisions about how to name things and the structure of those names - whether the decisions were made by people at Amazon or they were importing other companies' taxonomies (probably both) or using semantics to create algorithms. Shirky is right in that it probably wasn't a person or group of people deciding that they didn't like gay people that day. But (as Richard points out) it was the result of heteronormative thinking creating search rules that ultimately resulted in...#amazonfail.

What taxonomizing teaches you is that no worldview is neutral, and the best you can hope for is to keep trying to reach in that direction. Detangling what happened at Amazon is compounded by the fact that they aren't talking to anyone, but it appears to be a compilation of complacent taxonomizing, linking certain concepts to the theme "adult", imposing some sort of filter on the "adult" titles (without realizing what "adult" meant in terms of the terms that linked to it) in a misguided effort to make explicit books less visible, not fully investigating the problem when it first came to Amazon's attention (but dismissing it as a "policy" decision, which is most likely never was in the first place), and now not really responding effectively. Probably because those in charge of responding really have no idea how it happened.


Laura wrote that last bit before Amazon's second response.

Taxonomies and tags are political. Indexing is political. Labeling structures are political. So I wonder what tags I'll use to categorize this post - ;-)

If you want to read up on what happened, and many people's responses, here's a list of blog postings:
Laura Dawson
Clay Shirky
Mary Hodder
Richard Eoin Nash
Jane at Dear Author

Bruce Sterling on tagging and Web 2.0

From a much longer (and very funny) presentation at Webstock 09:

Let's look at a few of these Web 2.0 principles and practices.
"Tagging not taxonomy." Okay, I love folksonomy, but I don't think it's gone very far. There have been books written about how ambient searchability through folksonomy destroys the need for any solid taxonomy. Not really. The reality is that we don't have a choice, because we have no conceivable taxonomy that can catalog the avalanche of stuff on the Web. We have no army of human clerks remotely able to tackle that work. We don't even have permanent reference sites where we can put data so that we can taxonomize it....

"Dynamic content." Okay, content is a stable substance that is put inside a container. It's stored in there: that's why you put it inside. If it is dynamically flowing through the container, that's not a container. That is a pipe. I really like dynamic flowing pipes, but since they're not containers, you can't freakin' label them!


There's a lot more, about the next thing, which he calls the Transitional Web. Worth the read.

Do tags work? By Cathy Marshall

bull
An entertaining study of photo tags on Flickr reveals user tags to be somewhat, um, lacking... In a study of photos of a mosaic of a bull in Milan, one that has a good luck ritual associated with it, Marshall found taggers tagging photos with retrievability-hampered results. In other words, the average joe isn't very good at tagging, even for their own data.

The message here is almost painful: a great proportion of user tags add little or no further information; as such, they don't appear as often in narratives or titles. Personal names, which may be quite useful for finding photos among one's own collection (especially over the long haul) are less well represented in all types of metadata, but are relatively similar in quantity.

Now here's a property of tags that I find almost comical: they are seldom verbs, even if a verb is just the thing to characterize a photo. What's unique about what tourists do when they visit the Galleria's bull mosaic? They spin. In fact, if you type in Milan spin as your Flickr search terms, you pull up 94 results, 70 of which are pictures of our bull mosaic. 20 out of 24 results on the first page are on target.

Although spin and spinning make the top 20 list of tags, they are by no means commonly used terms; they are used less than 1% of the time (0.7%). That's just 7 tags. On the other hand, spin makes up 4.8% and 9.5% of title and narrative terms. People just don't seem to be thinking of tags as verbs.

Tagging vs. indexing

The use of tags by readers has skyrocketed. According to the Pew Internet and American Life Project, January 2007, one third of U.S. Internet users (42 million Americans) had tagged some form of online content. 10 million Americans (7% of Internet users) are tagging content daily. (Gary Smith, Tagging, p. 18)

Here's a new study analyzing tagging practices. Is this "indexing by mob" or is it a valuable source of vocabulary?

PS. I really really recommend this book.
tagging

Gene Smith on tagging

Gene Smith, author of a great book called "Tagging: People Powered Metadata for the Social Web," has a nice presentation up on SlideShare.net.

gene

It's worth taking a look even without any notes. Highly recommended book, too.