[OSI] Technology: Improving the Use of Open Sources

Presenters (from the Session blurb):
* Steve Selwyn, Deputy Associate Director of Transformation, Office of the Chief Information Officer, Office of the Director of National Intelligence
* Joe Markowitz, Independent Scholar
* Brian Kettler, ISX Lab Chief Scientist and Principal Research Engineer, Lockheed martin Advanced Technology Labs
* Troy M. Pearsall, Executive Vice president of Technology Transfer, In-Q-Tel

I’ll just try to wrap up some of the presentations briefly. Selwyn talked a bit about the process of integrating the various systems available to various agencies, and ways of reducing redundancy.

Kettler spoke on the application of Web 2.0 (take a drink) to open source intelligence issues. He made a strong push for Tapscott’s Wikinomics (which I have yet to read, but someone recently told me was “Benkler Light”). He talked a bit about Technorati, blogosphere analysis, splogs, emotional machine coding, and some of the projects in the TREC conferences.

Markowitz talked a little bit about integrating data and research at different levels. Beginning at the lowest level (open), and allowing queries and results (and machine translation) to propagate to more secure levels in which the analyst can be a bit more detailed in her querries, etc.. This “one-way transfer” allows for insulation against information backwash (my term, not his!). He talks a bit about how it might be possible to link backwards, taking a piece of cleared information and match it with information you can distribute at the open level, to get feedback from non-cleared sources.

Current system allows for a kind of “watched update,” where the analyst can indicate items that are of interest, and a static copy of the page can be moved upward (presumably with the analysts notes), as can a “watch request” that will note changes in those pages. Have to admit, I’m not following much of this as it is outside of my domain.


A couple of questions relating to how to get the gov to pay for licenses, there was the beginning of an interesting discussion here on how to value intelligence data, but didn’t get very far.

Someone asked: What specific technologies are coming down the pike now to deal with the phenomenal internet growth problem? Pearsall noted that new search engines are about assigning themes or facets to search results, moving from keyword-only to more contextualized search. There is a subsidiary problem. Search engines tend not to be enterprise software [but what about Google Enterprise?], but rather ad-driven, making it harder to acquire the technology.

Markowitz says this is a challenge to those who aim for unfettered access. The underlying assumption when you share is that the signal will increase more than the noise, and the internet is a counterexample of this. He went on to suggest it may not be worth mining that data [!].

Qestion from the audience: Is there a room for amateur analyst? E.g., an open Intellipedia? [Isn’t that called Wikipedia ;)] Markowitz notes, to the amusement of the group, that “every policymaker is his own analyst… as we’ve learned.” Early Bird represents both the promise and problem here. On the one hand, you can get the raw data out there fast, but then what’s the value added?

(In a later talk, Mr. Naquin, Director of the Open Source Center, noted that the Center articulates with outside experts, including professors and folks like IntelCenter. It seems silly to me to ignore the possibility that good analysis–especially open source analysis–is already happening outside the government intelligence community. )

Is collection actually needed, when the internet is already collecting? Can you do analysis in place? Tools are getting more sophisticated, and so it will bubble up, maybe. Also, remember, that you’ve got to do work to get at the deep web. Markowitz suggests that real analysis usually follows something where you have a hypothesis and then look to confirm it. Google doesn’t do that, but it might be more helpful to have a hypothesis testing search engine.

Someone asked about what happens when people get, e.g., an IP redirect (e.g., al Jazeera looks different for someone coming from a US IP address than it does for someone coming from the Middle East.) I think what he was actually asking was the degree to which the process of collecting open source material might lead to a traffic footprint that suggests the current interests of the US intelligence community, but I may be reading my own question into the question. Not a lot of useful discussion here, but–if indeed that was the intended question!–I would have liked to have heard an answer.

Someone from Booz Allen asks whether there is any intention to have something like Yahoo! Pipes on the secure networks so that analysts can build, deploy, and share their own software tools? Great question! The panelists talked a bit about sandboxes for new product, and the difficulty of trusting code of unknown origin.

Someone from California Dept. of Homeland Security has logins for seven or eight secure sources of “sensitive but unclassified” material. Are they seeking one login to rule them all? Yes, moving toward a single-sign-on, but its not easy. You have both identities and personae (based on where the user is physically, organizationally, etc.). There are reasons for compartmentalization.

A question from someone from “UK Law Enforcement.” How do you find the videos you should be looking at. What about Google’s tagging game, for example? Anything like that happening with video? Tagging only works well when there is a large group who can coalesce around a tag. What is really needed is software that can parse video, but there is enough need that it will be developed. [Aaargh. The ideal shoots and kills the workable!] Publisher tagging is the way forward on this: publishers are trying to get a message across.

This entry was posted in General and tagged , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>