<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Need To Know</title>
	<atom:link href="http://www.coblentzclan.com/needtoknow/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.coblentzclan.com/needtoknow</link>
	<description>About Search, Indexing, Classification, Taxonomies, and other topics around information science.</description>
	<lastBuildDate>Mon, 07 Jul 2008 19:17:28 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Stats don&#8217;t lie, do they?</title>
		<link>http://www.coblentzclan.com/needtoknow/?p=17</link>
		<comments>http://www.coblentzclan.com/needtoknow/?p=17#comments</comments>
		<pubDate>Fri, 27 Jun 2008 01:10:19 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[AJAX]]></category>
		<category><![CDATA[enterprise 2.0]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.coblentzclan.com/needtoknow/?p=17</guid>
		<description><![CDATA[I was cleaning out the old cubicle and finally sat down to go through the magazines.  You know, the ones you keep meaning to read but never really get around to.
Well, they&#8217;re old.  I&#8217;m not going to read the really old stuff but some of them were dog-eared (pages bent over to mark [...]]]></description>
			<content:encoded><![CDATA[<p>I was cleaning out the old cubicle and finally sat down to go through the magazines.  You know, the ones you keep meaning to read but never really get around to.</p>
<p>Well, they&#8217;re old.  I&#8217;m not going to read the really old stuff but some of them were dog-eared (pages bent over to mark them).  I couldn&#8217;t remember why I did that so I had to read the article again.  I came across this set of &#8220;stats&#8221; in the May 5, 2008 Information Week issue about analytics (page 6):</p>
<p style="padding-left: 30px;">&#8220;The Information Week 500 recognizes the most innovative business technology companies.  So, what defines an innovative organization?  A look at the 500 companies that made the list last year finds:</p>
<ul>
<li>54% have deployed AJAX development tools</li>
</ul>
<ul>
<li>43% use business intelligence to boost productivity</li>
</ul>
<ul>
<li>27% communicate with customers via wikis, blogs, and social networking.&#8221;</li>
</ul>
<p>Given our interest in Enterprise 2.0 developments, these are interesting stats.  I hope they are meaningful and related to what we are doing or otherwise I am just having &#8220;happy ears&#8221; and hearing what I want to hear&#8230;</p>
<p><em>(Note: Dear InformationWeek &#8211; if there is a link to this article/item on your site, I&#8217;d be happy to use that rather than type out the text &#8211; I couldn&#8217;t find it though). </em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.coblentzclan.com/needtoknow/?feed=rss2&amp;p=17</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>We interrupt this scheduled broadcast&#8230;</title>
		<link>http://www.coblentzclan.com/needtoknow/?p=16</link>
		<comments>http://www.coblentzclan.com/needtoknow/?p=16#comments</comments>
		<pubDate>Wed, 25 Jun 2008 22:26:27 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[enterprise 2.0]]></category>
		<category><![CDATA[search engines]]></category>

		<guid isPermaLink="false">http://www.coblentzclan.com/needtoknow/?p=16</guid>
		<description><![CDATA[I need to take a short break from the session about search and EMC.  I came across this entry in the blogosphere:
The Top 100 Alternative Search Engines by Written by Charles Knight, AltSearchEngines editor / January 29, 2007  2:34 AM
To be very honest with you, I don&#8217;t use nor have I tried all [...]]]></description>
			<content:encoded><![CDATA[<p>I need to take a short break from the session about search and EMC.  I came across this entry in the blogosphere:</p>
<p style="padding-left: 30px;"><a href="http://www.readwriteweb.com/archives/top_100_alternative_search_engines.php">The Top 100 Alternative Search Engines</a> by Written by <a href="http://www.readwriteweb.com/about_charles.php">Charles Knight, AltSearchEngines editor</a> / January 29, 2007  2:34 AM</p>
<p>To be very honest with you, I don&#8217;t use nor have I tried all of these different engines.  I think they actually validate my other points, that you should use the engine and technique for the problem at hand.  Some of them are fun to play with (although <a href="http://www.msdewey.com/">Ms. Dewey </a>is a bit loud for work) and they (for the most part) variously demonstrate the basic items about search &#8211; user interaction and visualization, keywords, ranking, relevancy, clustering, relationship tracking, social search, etc.</p>
<p>I think I will have to add in a whole posting thread about the engines listed; not to duplicate what Mr. Knight has done but hopefully to extend it.</p>
<p>Take a look and see for yourself.  Enjoy.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coblentzclan.com/needtoknow/?feed=rss2&amp;p=16</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Tag clouds as art</title>
		<link>http://www.coblentzclan.com/needtoknow/?p=15</link>
		<comments>http://www.coblentzclan.com/needtoknow/?p=15#comments</comments>
		<pubDate>Fri, 20 Jun 2008 23:18:18 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[art]]></category>
		<category><![CDATA[tag clouds]]></category>
		<category><![CDATA[wordle]]></category>

		<guid isPermaLink="false">http://www.coblentzclan.com/needtoknow/?p=15</guid>
		<description><![CDATA[Well, this isn&#8217;t related to search exactly, but it is an interesting visualization of the topic.  I came across an unusual site, Wordle, and pasted the last post into the site.  You get some amazing tag cloud layouts.  Frankly, this is art more than anything but that&#8217;s what art does: it challenges your perspectives. [...]]]></description>
			<content:encoded><![CDATA[<p>Well, this isn&#8217;t related to search exactly, but it is an interesting visualization of the topic.  I came across an unusual site, Wordle, and pasted the last post into the site.  You get some amazing tag cloud layouts.  Frankly, this is art more than anything but that&#8217;s what art does: it challenges your perspectives.  	</p>
<p><a href="http://wordle.net/gallery/04887/Search" title="Wordle: Search"><img src="http://wordle.net/thumb/04887/Search" style="padding:4px;border:1px solid #ddd"></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.coblentzclan.com/needtoknow/?feed=rss2&amp;p=15</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Form follows function</title>
		<link>http://www.coblentzclan.com/needtoknow/?p=13</link>
		<comments>http://www.coblentzclan.com/needtoknow/?p=13#comments</comments>
		<pubDate>Thu, 12 Jun 2008 21:56:57 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[commodity systems]]></category>
		<category><![CDATA[Content Management]]></category>
		<category><![CDATA[requirements]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[search engine folks]]></category>
		<category><![CDATA[search engine functions]]></category>
		<category><![CDATA[search engine teams]]></category>
		<category><![CDATA[search problem]]></category>
		<category><![CDATA[search question]]></category>
		<category><![CDATA[use cases]]></category>

		<guid isPermaLink="false">http://www.coblentzclan.com/needtoknow/?p=13</guid>
		<description><![CDATA[If your users are looking to search for information and the deployed architecture does not support that style of search, then the system will be crippled overall.   No matter how good a search engine is, if it&#8217;s deployed badly, or if it is unsuited to the type of search that is being asked [...]]]></description>
			<content:encoded><![CDATA[<p>If your users are looking to search for information and the deployed architecture does not support that style of search, then the system will be crippled overall.   No matter how good a search engine is, if it&#8217;s deployed badly, or if it is <em>unsuited</em> to the type of search that is being asked of it, it will not perform well.</p>
<p>The commodity systems that are available out on the market have been designed with a leading purpose in mind.   One question then, is to divine just what that purpose was and to see if it matches your search problem.    As a former consultant, I find it interesting that many people have approached the problem as a &#8220;feature/function&#8221; comparison and then selected an offering that really doesn&#8217;t match the need.    I don&#8217;t think the fault is entirely with the search engine folks but rather in the buyers just not understanding their own problem.   There&#8217;s something inherent about search engine functions and features that makes it difficult to match them to the problem and that probably lies in a poor understanding of what people are doing in their job execution overall that forces them to look for information.  <em>(Now that part of the problem </em>does <em>belong to the search engine teams). </em></p>
<p>I&#8217;ll set that thought aside for now and move to taking a look at the nature of the use cases around search.  A friend of mine has suggested that that are something like 5 different use cases around search:<em>(Feel free to argue with me on this point).</em></p>
<ol>
<li>Simple File System search</li>
<li>eMail Archiving</li>
<li>Government Data Warehousing</li>
<li>Generic Content Management</li>
<li>Legal Discovery</li>
</ol>
<p>&#8230; but I think this is more like a spectrum of use cases to deal with and looks something like this:</p>
<p><a rel="attachment wp-att-14" href="http://www.coblentzclan.com/needtoknow/2008/06/12/form-follows-function/slide11/"><img class="alignnone size-medium wp-image-14" title="Use case spectrum for search" src="http://www.coblentzclan.com/needtoknow/wp-content/uploads/2008/06/slide11-300x225.gif" alt="" width="300" height="225" /></a></p>
<p>I think the biggest takeaway on this topic is that the amount of metadata is always increasing <em>(when going to the right) </em>and it also becomes more user defined (adhoc) and less pre-defined (a priori).  Notice also how the nature of the search question is changing across the spectrum.  I think this is the crux of the problem on the buyer&#8217;s side &#8211; they don&#8217;t understand their own question well enough <em>in the context of general search</em> and then the seller&#8217;s get into it and voila!  you get a mis-match on the system.</p>
<p>Shouldn&#8217;t be a surprise to anyone.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coblentzclan.com/needtoknow/?feed=rss2&amp;p=13</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What kind of Search?</title>
		<link>http://www.coblentzclan.com/needtoknow/?p=8</link>
		<comments>http://www.coblentzclan.com/needtoknow/?p=8#comments</comments>
		<pubDate>Wed, 04 Jun 2008 23:14:10 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Data mining]]></category>
		<category><![CDATA[exhaustive search]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[information search innovation enterprise_2.0]]></category>
		<category><![CDATA[Internet Search]]></category>
		<category><![CDATA[mining]]></category>
		<category><![CDATA[search types]]></category>
		<category><![CDATA[text mining]]></category>
		<category><![CDATA[unstructured search]]></category>
		<category><![CDATA[web-mediated services]]></category>

		<guid isPermaLink="false">http://www.coblentzclan.com/needtoknow/?p=8</guid>
		<description><![CDATA[I mentioned in my previous post that I was stunned to realize that search as a a technology was really only about 18 years old. I just didn&#8217;t realize. No wonder it doesn&#8217;t work well.
The other thing that I&#8217;ve observed is that most people, including architects, designers, product managers, etc., don&#8217;t really think through the [...]]]></description>
			<content:encoded><![CDATA[<p>I mentioned in my previous post that I was stunned to realize that search as a a technology was really only about 18 years old. I just didn&#8217;t realize. No wonder it doesn&#8217;t work well.</p>
<p>The other thing that I&#8217;ve observed is that most people, including architects, designers, product managers, etc., don&#8217;t really think through the question of how the users will invoke search. What are these people searching for? Why are they searching for it at all? How will they approach the job at hand?</p>
<p>As one of the stakeholders at CM&amp;A around search, indexing, analytics, and visualization, I get asked a lot of questions about search and what we&#8217;re planning to do, about it. I also get to ask questions back like, &#8220;What kinds of search do you do?&#8221; and, &#8220;How well does it work?&#8221; Most of the time the answer is that it doesn&#8217;t work very well and &#8220;I prefer to search the internet as opposed to my own intranet&#8221;. It&#8217;s an interesting statistic that about 50% of all keyword searches fail. That is, the user has to modify the query keywords and try again. And that&#8217;s just the interfaces like Google, which are pretty minimalistic.</p>
<p>I admit it, I&#8217;m not a big fan of PowerLink and according to most customers I talk to, they aren&#8217;t either. I think that one reason is in how customers and employees approach the question of search and how it was implemented is a disconnect. So that&#8217;s today&#8217;s topic.</p>
<p>I started a set of research which I call &#8220;the psychology of search&#8221; and is really focused on how people approach the problem/question. I think there are 3 kinds of search: <strong></strong></p>
<p><strong>Recovery</strong>, which is looking for a specific thing in a specific place (<em>&#8220;Honey, have you seen my car keys?&#8221;</em>);</p>
<p><strong>Discovery</strong>, which is about looking something up (<em>&#8220;Okay, what&#8217;s on the calendar for today?&#8221;</em>); and</p>
<p><strong>Exploration</strong>, which is akin to information research (<em>&#8220;Hmm, what are the competitors doing?&#8221;</em>).</p>
<p>The graphic below might sum this up:<br />
<a href="http://www.coblentzclan.com/needtoknow/wp-content/uploads/2008/06/general-types-of-search.jpg"><img class="alignnone size-medium wp-image-9" title="general-types-of-search" src="http://www.coblentzclan.com/needtoknow/wp-content/uploads/2008/06/general-types-of-search-300x225.jpg" alt="" width="300" height="225" /></a></p>
<p>These are not the academic definitions &#8211; I will refer you to <a class="jive-link-external" href="http://www.sigir.org/forum/F2002/broder.pdf">Broder&#8217;s Taxonomy</a> for that &#8211; but customers seem to get the picture for this pretty well.</p>
<p>These generalized categories arose from this slightly more detailed list of search types we&#8217;ve seen people execute:</p>
<ul>
<li><strong>Specialized Search</strong> -Seeking specific information whose characteristics are tightly defined</li>
<li><strong>Generalized Search</strong> -Unrefined, broad, loosely defined. Not concerned with completeness of results&#8230;</li>
<li><strong>Legal Investigation</strong>
<ul>
<li>Defendant &#8211; an exhaustive search of all sources and assets for specific information. Generally tightly defined. Completeness of results is essential. Results must bear intense scrutiny from opposing council. Must provide transparency into idiosyncrasies.</li>
<li>Plaintiff &#8211; a detailed examination of assets for specific information. Typically more loosely defined. Resembles Knowledge Discovery in that plaintiff&#8217;s attorney&#8217;s will want to &#8220;cast a broad net&#8221; for information.</li>
</ul>
</li>
<li><strong>Knowledge Discovery (KM)</strong> -R&amp;D, Patent Analytics. Generally an unstructured search with a few keywords</li>
<li><strong>Business Intelligence</strong> -Data mining, trolling, pattern recognition, trending analysis. Usually performed on historical data archives to refine operational behaviors or processes</li>
<li><strong>Internet Search</strong> -The main categories for such queries are shopping, finding various web-mediated services, downloading various type of file (documents, etc), accessing certain data-bases (e.g. Lexis-Nexis type data), finding servers (e.g. for research ) etc.</li>
<li><strong>Situational Awareness</strong> -Watching or monitoring specific data sources for activities, trends, events. Generally accomplished by text mining of sources like LexisNexis, etc.</li>
<li><strong>Navigational</strong> -Characterized by knowing you have something but not remembering where you put it. Could have very high query rates if widely deployed. People prefer categorical navigation.</li>
</ul>
<p>I think once you realize that there are all kinds of search types and that these need to be supported well, you will start thinking about the kinds of search and indexing technologies you will use to help your customers. Until this happens, though, search will be a very dissatisfying experience.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coblentzclan.com/needtoknow/?feed=rss2&amp;p=8</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search: A historical perspective</title>
		<link>http://www.coblentzclan.com/needtoknow/?p=5</link>
		<comments>http://www.coblentzclan.com/needtoknow/?p=5#comments</comments>
		<pubDate>Wed, 04 Jun 2008 21:22:41 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Too much data]]></category>
		<category><![CDATA[knowledge]]></category>
		<category><![CDATA[Momentum]]></category>
		<category><![CDATA[organizational]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[technology scene]]></category>
		<category><![CDATA[Ted Nelson]]></category>
		<category><![CDATA[Tim Berners-Lee]]></category>
		<category><![CDATA[web immigrant]]></category>
		<category><![CDATA[web native]]></category>

		<guid isPermaLink="false">http://www.coblentzclan.com/needtoknow/?p=5</guid>
		<description><![CDATA[I suspect that this entry will get a few comments. I hope it does.
I was putting together my deck for EMC World and thought that maybe it would make good blog material. People are always asking me about search, classification, analytics, and visualization anyway, so I thought, why not?
I started looking into the history of [...]]]></description>
			<content:encoded><![CDATA[<p><em>I suspect that this entry will get a few comments. I hope it does.</em></p>
<p>I was putting together my deck for EMC World and thought that maybe it would make good blog material. People are always asking me about search, classification, analytics, and visualization anyway, so I thought, why not?</p>
<p>I started looking into the history of search engines and I was very surprised to find that this is really new to the technology scene. I mean, full text is really only 18 years old. When I think about it, for me, search was always there. I worked in an advanced research shop in Mountain View, CA back in the early 80&#8217;s and was always on the internet. All these arcane applications that most people never heard of: Gopher, Finger, Archie, etc. (Unless you are really old, like me and Dave Reiner). I was using Netscape before it was a company. Okay, I&#8217;m old. I&#8217;m not a web native. I&#8217;m a web immigrant. But full text apparently got started just before I started at this R&amp;D company, so to me, search was just natural. I thought it was always there. Little did I know.</p>
<p>A bit of a historical perspective:</p>
<ul>
<li>In 1945, <a class="jive-link-external" href="http://en.wikipedia.org/wiki/Vannevar_Bush">Vannever Bush</a> (Wikipedia lists him as, &#8220;A leading figure in the development of the military-industrial complex and the military funding of science in the United States, Bush was a prominent policymaker and public intellectual (&#8221;the patron saint of American science&#8221;) during World War II and the ensuing Cold War, and was in effect the first presidential science advisor&#8221;).</li>
<li>In the 1960&#8217;s, <a class="jive-link-external" href="http://en.wikipedia.org/wiki/Gerard_Salton">Gerard Salton</a> invented the <a class="jive-link-external" href="http://en.wikipedia.org/wiki/SMART_Information_Retrieval_System">SMART Information Retrieval System</a>. Many important concepts in information retrieval were developed as part of research on the <a class="jive-link-external" title="ftp://ftp.cs.cornell.edu/pub/smart/" href="ftp://ftp.cs.cornell.edu/pub/smart/">SMART</a> system, including the <a class="jive-link-external" title="Vector space model" href="http://en.wikipedia.org/wiki/Vector_space_model">vector space model</a> and <a class="jive-link-external" title="Relevance feedback" href="http://en.wikipedia.org/wiki/Relevance_feedback">relevance feedback</a>.</li>
<li>In 1960, <a class="jive-link-external" href="http://en.wikipedia.org/wiki/Ted_Nelson">Ted Nelson</a> created <a class="jive-link-external" title="Project Xanadu" href="http://en.wikipedia.org/wiki/Project_Xanadu">Project Xanadu</a> with the goal of creating a computer network with a simple user interface. It didn&#8217;t succeed. <a class="jive-link-external" href="http://en.wikipedia.org/wiki/Project_Xanadu">http://en.wikipedia.org/wiki/Project_Xanadu</a></li>
<li>In 1963, <a class="jive-link-external" href="http://en.wikipedia.org/wiki/Ted_Nelson">Ted Nelson</a>coined the phrase, &#8220;<a class="jive-link-external" title="Hypertext" href="http://en.wikipedia.org/wiki/Hypertext">hypertext</a>&#8220;.</li>
<li>In 1972, ARPA invented the ARPANet which, &#8220;was the world&#8217;s first operational <a class="jive-link-external" title="Packet switching" href="http://en.wikipedia.org/wiki/Packet_switching">packet switching</a> network, and the predecessor of the global <a class="jive-link-external" title="Internet" href="http://en.wikipedia.org/wiki/Internet">Internet</a>.</li>
<li>In 1990, Alan Emtage creates the first search engine, &#8220;<a class="jive-link-external" href="http://en.wikipedia.org/wiki/Archie_search_engine">Archie</a>&#8221; (short for Archives&#8221;)</li>
<li>1991: Tim Berners-Lee combines with TCP/IP and invents the WWW. First WWW site, <a class="jive-link-external" href="http://info.cern.ch/">http://info.cern.ch/</a>, posted August 6, 1991.</li>
<li>1994: Berners-Lee founds the World Wide Web Consortium (W3C) at MIT</li>
</ul>
<h2>Today: We know we have too much data.</h2>
<p>I find it interesting to look at some old presentation materials from User Conferences (Documentum folks call these User Conferences &#8220;Momentum&#8221;; EMC&#8217;ers call it EMC World. Same thing).</p>
<p>Here&#8217;s a Momentum presentation by a customer (Nortel Networks) back in 1998:</p>
<p>The meat of the discussion: back in 1998, 5 years after the invention of the internet, Nortel already has the &#8220;too much data, not enough information, and I can&#8217;t find any of it&#8221; problem. Wow.</p>
<p><a href="http://www.coblentzclan.com/needtoknow/wp-content/uploads/2008/06/nortel-slide-2.gif"><img class="alignnone size-medium wp-image-4" title="nortel-slide-2" src="http://www.coblentzclan.com/needtoknow/wp-content/uploads/2008/06/nortel-slide-2-300x225.gif" alt="" width="300" height="225" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.coblentzclan.com/needtoknow/?feed=rss2&amp;p=5</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

