RE: MediaDefender Proposal: Web Crawler

From: Ben Grodsky <>
Date: Tue, 10 Apr 2007 20:48:38 -0700

Jeremy or Mo,
We were wondering whether you've had time to consider the below. Please let us know your thoughts.


From: Ben Grodsky
Sent: Wed 21-Mar-07 20:43
Cc: Randy Saaf; Jay Mairs
Subject: MediaDefender Proposal: Web Crawler

 Jeremy and Mo,


Per our previous conversation, outlined herein is MediaDefender's proposed method to gather and transmit information to the IFPI about illegal website sources for musical tracks.


Data Collection: MediaDefender will search Google ( for known keywords in order to generate a list of websites leading to mp3 files, using both an Automated and Human approach. While MediaDefender will endeavor to develop automated tools, so that a high volume of searching can be accommodated, MediaDefender recognizes the inherent limitations in a fully automated system and will rely heavily on human input.

        * Automated

                        * That list will then be a Focused List of sites ("Focus List") that MediaDefender iterates through at a high rate for additional known keywords.
                        * MediaDefender's system will be able to iterate through a list of over 20,000 key words, provided by IFPI.

        * Human

                        * MediaDefender Data Analysts ("Data Analysts") will also search High Priority keywords ("High Priority") several times daily, taking special note to update the Focus List as new sites generate online chatter/buzz.
                        * Data Analysts will be advanced to user verification systems, or other tests designed to circumvent automated website parsing, to facilitate thorough searching on more advanced websites.

* Reporting: MediaDefender will report to Customer via an XML feed to Customer's specifications. Said feed will include the artist, album, source website, time, date and verify the link was accessible at the time crawled.

Please let us know what your thoughts about this proposal are.
Ben Grodsky
Director of Operations
MediaDefender, Inc.
W: 310.956.3355 M: 323.394.6637
AIM: grodskymd <>
Received on Fri Sep 14 2007 - 10:55:57 BST

This archive was generated by hypermail 2.2.0 : Sun Sep 16 2007 - 22:19:46 BST