RE: Supply results

From: Ben Grodsky <grodsky_at_mediadefender.com>
Date: Thu, 30 Aug 2007 16:04:54 -0700

Andrew,
 
The answers to your questions are below and should be in blue:
 

1. How does searching work on ed2k? Does a query look for keywords in the filename?

The data set you included is for ed2k supply, which isn't gathered by queries. When a user connects to a server, he uploads a file list of what files he has available for sharing. We're parsing these file lists for your projects' keywords on each filename -- so this doesn't have anything to do with queries/searches, as such.

 

2. Were all the mixed results because of searches for multiple assets? i.e. one search was done for Serenity and another one for Munich, but they both coincidently point to the same file hash.

The multiple hash thing that you're identifying here is the natural "cross-naming effect," as seen on several p2p networks. Multiple files are identified with the same user hash when people rename the same file and begin sharing that-- this happens naturally very often, and can be used by companies, such as MediaDefender, for various protection-type exploits (but MediaDefender decoys are being filtered from your data, so that's probably not something you would need to be concerned with here).

 

3. What type of data can you pull from an ed2k search? Are you able to grab all the IPs of users sharing a file just from searching on the network?

>From an ed2k search (the ones marked as "demand" in your ed2k data from us) you can't pull much of anything beyond the data we're already providing you -- as far as user identification, the IP of the person that made that search. Ed2k is server-based, so you would never be able to see things from truly network-wide -- KAD is more useful for global views. Even so-called "global" searches on eMule (I'm referring to the searches that the eMule client refers to as "global") rely so much on the sequencing of servers in the user's server.met file that they aren't global at all (and in my opinion "global" is a complete misnomer for that button, but that's what the eMule client calls it, so I'm assuming that might be what you're referring to).

 

 

 Are partially downloaded files detected as supply? Or are only completed downloads? Is it safe to say that this user downloaded this file within ~4 hours from making his last request?

Yes, partially downloaded files are detected as supply. Strictly speaking, it is not safe to make any conjectures as to the completeness of files showing up as supply on ed2k; if the user has ANY data chunks of the file, he is considered a source for that hash and at that point his file-list would be updated to indicate he's a source (just because he has 1 data chunk). So, the only thing that is absolutely safe to say is within ~4 hours of making that request, that user was identified as offering that hash....

 

I hope that all made sense, if not let me know. Also, let me know how else we can help you and Necip.

 
Thanks,
Ben

________________________________

From: Ben Grodsky
Sent: Thu 30-Aug-07 12:47
To: Skinner, Andrew (NBC Universal)
Subject: RE: Supply results

Andrew,
 
I'm talking through your below questions with a few of our developers now. Sorry I didn't update you accordingly yesterday.
 
-Ben

________________________________

From: Skinner, Andrew (NBC Universal) [mailto:Andrew.Skinner_at_nbcuni.com]
Sent: Wed 29-Aug-07 13:51
To: Ben Grodsky
Subject: Supply results

Hi Ben,

 

Can you clarify the following points?

 

[filehashexamples.xls] In cases 2 and 3 there are many different filenames associated with a single filehash. This raises the following questions:

 

1. How does searching work on ed2k? Does a query look for keywords in the filename?

 

2. Were all the mixed results because of searches for multiple assets? i.e. one search was done for Serenity and another one for Munich, but they both coincidently point to the same file hash.

 

3. What type of data can you pull from an ed2k search? Are you able to grab all the IPs of users sharing a file just from searching on the network?

 

 

 

 

[MD IP address examples.xls] in case 5 we see that a user searched for "dvx ita scoop" and then 4 hours later was detected with having Scoop as supply.

 

58.140.33.151

8/6/2007 4:17

demand

dvx ita scoop

58.140.33.151

8/6/2007 8:03

supply

(DivX - ITA) - Scoop - (DVDRIPP) by Gly-AsTrA - COMMEDIA...Woody Allen, Hugh Jac

 

 

1. Are partially downloaded files detected as supply? Or are only completed downloads? Is it safe to say that this user downloaded this file within ~4 hours from making his last request?

 

 

Thanks,

Andrew

 

 

 
Received on Fri Sep 14 2007 - 10:56:00 BST

This archive was generated by hypermail 2.2.0 : Sun Sep 16 2007 - 22:19:47 BST