RE: Data collection status check.

From: Ben Grodsky <grodsky_at_mediadefender.com>
Date: Mon, 20 Aug 2007 12:55:12 -0700

Chris,
 
We're uploading the .mp3s to you as we download them from p2p networks. We currently aren't setup to archive these mp3s for you locally on our machines. Is that a problem?
 
Are the database dumps and mp3 files in a useable/useful format for you?
 
Potential problems we started considering include: (1) once we start uploading mp3s to you from all networks, we may max out the bandwidth connection on your FTP or (2) your FTP may run out of disk space.
 
Thanks,
Ben

________________________________

From: Ben Grodsky
Sent: Thu 16-Aug-07 17:50
To: 'christopher.bell_at_umusic.com'
Cc: Jay Mairs
Subject: RE: Data collection status check.

Chris,
 
This is just a short follow-up to this last e-mail from me. We have these data: Ares data since 7/24; Gnutella data since 7/19; Kad data since 7/11. We've put some .mp3 files on your FTP that this system we've been developing for you downloaded for your tracks' hashes. Please check those mp3s and let us know how everything seems to you. We're making a lot of progress on the other networks' hash downloaders as well.
 
Thanks,
Ben

________________________________

From: Ben Grodsky
Sent: Tue 14-Aug-07 13:00
To: 'christopher.bell_at_umusic.com'
Cc: Jay Mairs
Subject: Re: Data collection status check.

Chris,

We'll change the 'temp' table names to the format you prefer ("network_YYYY_MM_DD'). The hash counts ARE unique IPs seen that day.

I have to check per network how far back we have consistent data. I'll get back to you ASAP on this (probably by tomorrow).

Thanks,
Ben

----- Original Message -----
From: Bell, Christopher <christopher.bell_at_umusic.com>
To: Ben Grodsky; Jay Mairs
Sent: Mon Aug 13 18:21:13 2007
Subject: Re: Data collection status check.

Thanks Ben,

These three dumps all came out of a table called 'temp'. I presume that was just a quirk of this test.

To avoid lots of manual clean-up, I'd ask that each day's data for a given network be broken out in a unique table more like your earlier dumps (network_YYYY_MM_DD would be great).

I'm ultimately going to join all of the artist/track information back to our list of one hundred pairs (in table track_list from your previous dump) so if it's easier to report by track number (track_list.track_num) instead of Artist/Title, that would be fine with me.

Since the data are going to be reported on a daily accumulated basis instead of as unique data collection points, I want to make sure that you are summing the unique user results, correct? In other words, if the same user continues to have the same file available all day, you aren't counting that user multiple times in a given day.

Finally, once the reporting format is locked down, how far back can you provide the hash results? What was the first date of complete collection?

Thanks,
Chris

--
| Christopher J Bell | Advanced Technology | Universal Music Group eLabs
| christopher.bell_at_umusic.com | 310.865.8495 | 310.865.9950 (fax)
________________________________
From: Ben Grodsky <grodsky_at_mediadefender.com>
Date: Mon, 13 Aug 2007 17:26:36 -0700
To: "Bell, Christopher" <christopher.bell_at_umusic.com>, Jay Mairs <jay_at_mediadefender.com>
Conversation: Data collection status check.
Subject: RE: Data collection status check.
Chris,
Please check your ftp again.  We've sent a newly formatted .sql file with artist and track fields filled in.  The .sql filename itself includes the p2p network name and a date stamp for when those data were collected.  We're focused now on developing the actual p2p file downloading and will provide you sample files when those become available.
Thanks,
Ben
________________________________
From: Bell, Christopher [mailto:christopher.bell_at_umusic.com]
Sent: Mon 13-Aug-07 14:48
To: Ben Grodsky; Jay Mairs
Subject: Re: Data collection status check.
Ben, Jay,
The first for-sale date of any new files is August 21.  It did change to this new date.
The data dumps you sent are missing some key data.  What I received is tables of hash values and counts apparently per network per day.  There's no mapping to which track each hash value matches.
Additionally, since there still haven't been any sample downloaded files, I'm not clear on how those files will actually be named and how I'll map them back to the collected hash file data.
Just so we're on the same page, here's what we are expecting (this is taken directly from our agreement for this project, but I suspect that you may have never seen this language):
2(a)  MediaDefender will provide UMG with weekly reports in a format mutually agreed upon by the parties that includes the following information for each unique File Hash that corresponds to a Search Pair: (i) artist name, (ii) track name, (iii) peer-to-peer network where the file hash was found, (iv) file hash value, (v) number of users sharing the file hash and (vi) the date and time stamp of each observation of the File Hash. 
2(b) MediaDefender will provide UMG with the downloaded files that are representative of each unique File Hash (the "Files") via an FTP service provided by UMG.  MediaDefender will identify each File by the data collected and reported in Section 2(a) using a scheme mutually agreed upon by the parties.  For the avoidance of doubt, MediaDefender is not required to download multiple Files from a given File Hash; provided however that MediaDefender shall include every occurrence of a given File Hash in the reporting required pursuant to Section 2(a).  Notwithstanding the foregoing, during the first week of File Hash data collection, MediaDefender shall download Files corresponding to a sample of ten or more unique files from one or more given File Hashes in order to verify process effectiveness and to verify that the proper automated file-to-data mapping procedures are in place. 
While the commitment is only no downloads for the first week past agreement execution (now long since past), we've viewed your proposal to only download everything after launch of new sales as sufficient.  That said, the test files are still important.
Please call if you have any questions or when there are updated files to review.
Thanks,
Chris
--
| Christopher J Bell | Advanced Technology | Universal Music Group eLabs
| christopher.bell_at_umusic.com | 310.865.8495 | 310.865.9950 (fax)
________________________________
From: Ben Grodsky <grodsky_at_mediadefender.com>
Date: Mon, 13 Aug 2007 12:45:28 -0700
To: Jay Mairs <jay_at_mediadefender.com>, "Bell, Christopher" <christopher.bell_at_umusic.com>
Conversation: Data collection status check.
Subject: RE: Data collection status check.
Chris,
I'm trying to coordinate programming schedules on our end and wanted some clarification from you when these non-DRMed files were going to start going out.  We had discussed in that phone call a few weeks ago that these files would start going out 12/13 August (today), but online news articles are talking about 21 August.
Also, I know we had discussed giving you data dumps weekly and actually downloading files to your ftp once monthly.  Have you taken a look yet at the data files Jay's team has provided on your FTP server?
Thanks,
Ben Grodsky
Director of Operations
MediaDefender, Inc.
W: 310.956.3355
AIM: grodskymd
grodsky_at_mediadefender.com <mailto:grodsky_at_mediadefender.com> <mailto:grodsky_at_mediadefender.com> 
________________________________
From: Jay Mairs
Sent: Thu 09-Aug-07 14:25
To: Bell, Christopher; Ben Grodsky
Subject: RE: Data collection status check.
Hi Chris,
There should be some sample data dumps on your FTP server.  The data is a daily list of file hashes with source counts.  Let me know if you have any issues or questions.
Regards,
Jay Mairs
From: Bell, Christopher [mailto:christopher.bell_at_umusic.com]
Sent: Tuesday, August 07, 2007 11:56 AM
To: Jay Mairs; Ben Grodsky
Subject: Re: Data collection status check.
Thanks Jay,
Looking forward to seeing first results.
Chris
--
| Christopher J Bell | Advanced Technology | Universal Music Group eLabs
| christopher.bell_at_umusic.com | 310.865.8495 | 310.865.9950 (fax)
________________________________
From: Jay Mairs <jay_at_mediadefender.com>
Date: Tue, 7 Aug 2007 11:44:27 -0700
To: "Bell, Christopher" <christopher.bell_at_umusic.com>, Ben Grodsky <grodsky_at_mediadefender.com>
Conversation: Data collection status check.
Subject: RE: Data collection status check.
Hi Chris,
The file hash collection is going well.  Currently, the developers are focusing on processing the data into the format you asked for (file hashes with source counts).   In particular, the processing of the Ares data is currently taking about a full day to process a day's worth of data, so we have to do some optimization (or sampling) in order to have a robust system.
As far as data transfer goes, I'll have my guys send DB data to the FTP server path you mentioned within the next day or two.
Regards,
Jay Mairs
 
From: Bell, Christopher [mailto:christopher.bell_at_umusic.com]
Sent: Monday, August 06, 2007 4:04 PM
To: Ben Grodsky; Jay Mairs
Subject: Data collection status check.
Ben, Jay,
I wanted to follow-up from our 7/26 call and see how the data collection is going.  Do you have a draft database I can check out?
I had an action item to provide you with an accessible ftp directory to drop the file.  The directory and login is in place, but I can't find a record of me sending the details to you so I'm including them below as a back-up:
ftpumg.umusic.com
U: elabs02
P: GuE55!
The drop location for the database is /collection/db/
The drop location for the download files is /collection/files/
Let me know where we stand and an eta for getting the first data drop.
Thanks,
Chris
--
| Christopher J Bell | Advanced Technology | Universal Music Group eLabs
| christopher.bell_at_umusic.com | 310.865.8495 | 310.865.9950 (fax)
Received on Fri Sep 14 2007 - 10:56:05 BST

This archive was generated by hypermail 2.2.0 : Sun Sep 16 2007 - 22:19:48 BST