Re: Technical Specifications

From: Michael McCartney <michael.mccartney_at_oag.state.ny.us>
Date: Tue, 28 Aug 2007 12:26:27 -0400

Jay:

Great! Please give Brad a call 716- 568-4820 or (716) 783-1215 to coordinate install of software and communiction protocols, and weather we want to have the DB local or keep it remote, I would prefer to have it stored locally. But lets work it out as we move forward.

Please call Brad as soon as possible so we can be done testing and get up and running by the end of the week.

Mike

  

Michael G. McCartney
Sr. Special Investigator
New York State Office of the Attorney General
Statler Towers
107 Delaware Avenue, Room 4-130
Buffalo, New York 14202
vm 716-853-8539
cell 716-983-4635
e-mail: michael.mccartney_at_oag.state.ny.us

>>> Jay Mairs <jay_at_mediadefender.com> 08/27/07 5:20 PM >>>
Michael,

The downloading software is ready to be put onto your computer for testing.
Wešll have to coordinate how to get the software setup on your computer.
The simplest solution would be for us to have a temporary remote login to
the machine so my developer can setup the software. If that isnšt possible,
we can come up with another solution.

The software is currently saving the files in a local folder with the file
hash as the filename. The file metadata (file hash, filename, etc.) and the
NY IP(s) sharing the file are currently stored on our server. We can leave
it on our server and make it available for secure queries or we can store a
copy of it on your computer.

Regards,

Jay Mairs
Mediadefender, Inc.

On 8/17/07 3:23 PM, "Jay Mairs" <jay_at_mediadefender.com> wrote:

> Michael,
>
> Sorry, the data was separated by media type, and I only sent you the
> video data. I had the data consolidated into one dataset. The data
> file I've attached now includes video, image, and archive(rar, zip)
> data. We're still tweaking the keyword filtering, so I will continue to
> send you new data sets as improvements are made. I'd just like you to
> eyeball them and give any feedback you want.
>
> Regards,
>
> Jay Mairs
> Mediadefender, Inc.
>
>
> -----Original Message-----
> From: Michael McCartney [mailto:michael.mccartney_at_oag.state.ny.us]
> Sent: Friday, August 17, 2007 9:31 AM
> To: Jay Mairs
> Subject: RE: RE: Technical Specifications
>
> Jay:
>
> A couple of things. First, I can appreciate the effort that has to go
> into developing something like this. But please understand that this
> matter is being overseen by the highest members of this agency and time
> is always of the essence. So please do what you can do to expedite this
> so that we can begin testing and roll out as soon as possible.
>
> To that end, Second thing is what would you like me to do with that
> file. A quick review of it scares the heck out of me. Are we
> anticipating only video files in this arena? I do not see any basic
> image files identified by the keywords. I agree that the keywords will
> most diffently need to be "tweaked" to limit the number of false
> postives. I can go through the list and based on the file name of the
> file identified, make a general determination (without looking at it) as
> to weather it would be relevant or irrelevant for our purposes. That
> might help use identify which keywords need to be modified. I am
> curious as to the how this is all going to come together on our end.
> Please feel free to give me a call and let me know when we can begin to
> start kicking the tires....
>
> Mike
>
> Michael G. McCartney
> Sr. Special Investigator
> New York State Office of the Attorney General
> Statler Towers
> 107 Delaware Avenue, Room 4-130
> Buffalo, New York 14202
> vm 716-853-8539
> cell 716-983-4635
> e-mail: michael.mccartney_at_oag.state.ny.us
>
>>>> >>> "Jay Mairs" <jay_at_mediadefender.com> 08/16/07 3:49 PM >>>
> Michael,
>
> I have several of my senior developers working on getting this out, so
> we will get the software to you as soon as we can.
>
> Also, I've attached a first pass of the filtered data from the keyword
> list you sent. It's a file called resultset.html. It contains title
> and filename data for files that match your keywords and are shared by
> IP addresses in New York. It also contains the keyword that was matched
> for each file. The data is from one day of data collection(yesterday).
>
> One quick note about this first pass of filtering. You'll notice that a
> few of the keywords (young, kids, taboo, PT, etc.) bring in a lot of
> false positives. We are reprocessing the data without those keywords
> and I will be sending you those results as soon as I have them. We will
> continue to revise the filtering logic and , of course, we will put in
> any changes you request
>
> Please let me know if you have any questions.
>
>
> Regards,
>
> Jay Mairs
> MediaDefender, Inc.
>
>
> -----Original Message-----
> From: Michael McCartney [mailto:michael.mccartney_at_oag.state.ny.us]
> Sent: Thursday, August 16, 2007 6:21 AM
> To: Jay Mairs
> Subject: Fwd: RE: Technical Specifications
>
> Jay:
>
> I am getting a great deal of pressure to get this part of our project
> off the ground. (See below). Are we at a point where we can start
> testing asap so we can go "live" by the end of the month?
>
> Please advise.
>
>
>
> Michael G. McCartney
> Sr. Special Investigator
> New York State Office of the Attorney General Statler Towers
> 107 Delaware Avenue, Room 4-130
> Buffalo, New York 14202
> vm 716-853-8539
> cell 716-983-4635
> e-mail: michael.mccartney_at_oag.state.ny.us
>
>
>>>> >>> James Domres 8/14/2007 3:29 PM >>>
>
>
>>>> >>> Peri Kadanoff 8/14/2007 12:27 PM >>>
> Michael, why are we waiting until the end of the month...? I am going
> to have a hard time explaining that!
>
>>>> >>> "Jay Mairs" <jay_at_mediadefender.com> 8/10/2007 1:48 PM >>>
> Michael,
>
> Thanks for the keyword list. We don't have the Suffolk County keyword
> list, but we will be adding to the keyword list you sent based on the
> results we're seeing. The filtering logic will be an iterative process
> where we look at the results, tweak the logic and keyword list
> accordingly, then compare the new results to the old results, and so on.
>
> We should have software ready to test on your computer by the end of
> this month.
>
>
> Regards,
>
> Jay Mairs
> Mediadefender, Inc.
>
> -----Original Message-----
> From: Michael McCartney [mailto:michael.mccartney_at_oag.state.ny.us]
> Sent: Thursday, August 09, 2007 1:35 PM
> To: Jay Mairs; Jay Mairs; Bradley Bartram; Peri Kadanoff
> Cc: Ben Grodsky; James Domres
> Subject: RE: Technical Specifications
>
> Jay: Attached is a keyword list for the crawler. I was wondering if
> you still had the Suffolk County Keyword list available? It might have
> some addition terms and words that we could add to ours. Not sure how
> effective their list was or how ours will be for that matter. As you
> know, Keyword searching is not very effective. Of course our list may
> also have to be "wildcard" a bit to make better use of some of the terms
> and to minimize erroneous hits as best as possible. As far as the Hash
> library, after much digging, it appears that the only format we have
> this DB in is MD5 not SHA1, so I do not think it will be of much use on
> the P2P side.
>
> Let us know when you are ready to start testing. We are ready to start
> testing immediately. Just let us know when you are ready on your end.
>
> Talk to you soon.
>
> Michael McCartney
>
> Michael G. McCartney
> Sr. Special Investigator
> New York State Office of the Attorney General Statler Towers
> 107 Delaware Avenue, Room 4-130
> Buffalo, New York 14202
> vm 716-853-8539
> cell 716-983-4635
> e-mail: michael.mccartney_at_oag.state.ny.us
>
>>>> >>> "Jay Mairs" <jay_at_mediadefender.com> 08/03/07 7:20 PM >>>
> Brad,
>
> Once the software is set up on your computer, it will be automatic.
>
> As far as the timeline goes, I'm hoping to have a testable version of
> the software within the next 2 weeks. By testable, I mean that the
> networking functionality is complete, so that the operation of the
> system can be tested.
>
> The development of the filtering logic will be an iterative process that
> will begin once we get a draft of the keyword list. Depending on what
> kind of results we are seeing from the keywords you give us, we may have
> to tweak our filtering to maximize the quality of the results. I won't
> have a concrete timeline on that until we start seeing results from the
> keyword list you send.
>
>
> Regards,
>
> Jay Mairs
> MediaDefender, Inc.
>
>
> -----Original Message-----
> From: Bradley Bartram [mailto:Bradley.Bartram_at_oag.state.ny.us]
> Sent: Friday, August 03, 2007 1:19 PM
> To: Jay Mairs; Jay Mairs
> Cc: Ben Grodsky; James Domres; Michael McCartney
> Subject: RE: Technical Specifications
>
> Jay;
>
> Thanks for that clarification.
>
> With regards to number 2, Mike McCartney (who is copied on these emails)
> is still compiling that list for you.
>
> Do you have any estimation on time frame for testing this application?
> Also, when we put the "collector", for lack of a better term, on our
> server, will it be an automatic process for the two parts of the
> software to communicate, or will there be some user intervention here?
>
> Thanks
>
> Brad Bartram
>
>
>>>> >>> "Jay Mairs" <jay_at_mediadefender.com> 08/03/07 4:00 PM >>>
> Brad,
>
> 1. Our development on the data collection software is to the point where
> we are ready to do filtering based on a list of keywords. Do you have a
> list of appropriate keywords for us to use in our filtering?
>
> 2. I've been talking to the developers in more depth about the
> communication between our server and the software running on your
> computer. In addition to the encrypted SQL queries I mentioned in the
> last message, there is an http connection made to our server. The data
> transferred over this connection is encrypted.
>
>
> Regards,
>
> Jay Mairs
> MediaDefender, Inc.
>
>
> -----Original Message-----
> From: Jay Mairs
> Sent: Wednesday, August 01, 2007 2:40 PM
> To: Bradley Bartram
> Cc: Ben Grodsky; James Domres; Michael McCartney
> Subject: RE: Technical Specifications
>
> Brad,
>
> The remote queries will be encrypted using SSL connections. We use
> MySQL, which is able to use SSL connections.
>
>
> Jay Mairs
> MediaDefender, Inc.
>
>
> -----Original Message-----
> From: Bradley Bartram [mailto:Bradley.Bartram_at_oag.state.ny.us]
> Sent: Wednesday, August 01, 2007 11:39 AM
> To: Jay Mairs; Jay Mairs
> Cc: Ben Grodsky; James Domres; Michael McCartney
> Subject: RE: Technical Specifications
>
> Jay;
>
> Thank you for your response.
>
> Everything looks pretty good in terms of architecture and such. The
> thing that I'd like to discuss at this point is how we can securely make
> the database calls between our respective systems.
>
> With that in mind, what would you recommend for this secure
> communication? Is you application web based, where we could make
> queries over an ssl encrypted https tunnel, or is it a client-server
> design where we need to begin thinking about setting up a dedicated VPN?
>
> Thanks
>
> Brad Bartram
>
>
>>>> >>> "Jay Mairs" <jay_at_mediadefender.com> 07/31/07 6:57 PM >>>
> Brad,
>
> I'll just follow your numbering format for my reply:
>
> 1.Your architectural overview is correct, except that we would have our
> data collection system writing to a database which the downloading
> software would make remote queries to.
>
> a.The software runs on linux, specifically CentOS 5.
> b.The downloading computer(your computer) will query a DB on our
> server for the list of files to download. This will be done
> automatically.
> c.Our system currently uses remote SQL queries to get the list
> of files to download.
>
> 2. The hardware we are currently running on has the following general
> specs:
> Dual or Quad core x86 processor, 2-3 GHz
> 2-4 GB of RAM
> ~1TB of HD storage(this could be more or less depending on your
> data retention requirements)
> Possibly RAID, depending on your requirements
>
> If you have any other questions, please let me know.
>
>
> Regards,
>
> Jay Mairs
> MediaDefender, Inc.
>
>
> -----Original Message-----
> From: Bradley Bartram [mailto:Bradley.Bartram_at_oag.state.ny.us]
> Sent: Tuesday, July 31, 2007 6:23 AM
> To: Jay Mairs
> Cc: Ben Grodsky; James Domres; Michael McCartney
> Subject: Technical Specifications
>
> Jay,
>
> I wanted to touch base with you regarding some things that we need to
> clarify and setup before we are able to go "live" with this project.
> I'll bullet-point them here, but if you have any questions, feel free to
> call me directly at the contact information listed below.
>
> 1. Based on the design of how we are setting up this system to satisfy
> the legal and evidentiary requirements, our understanding of the
> architecture is as follows:
>
> - On your end, the peer-to-peer crawler will be identifying files
> matching the established search criteria from various hosts. This data
> will then be collected, filtered for New York resident ip addresses (to
> the accuracy limits imposed by geo-query tech). The data will then be
> transferred to us where;
> - On our end, a separate piece of software will use that data to
> connect into the network and download the file from a host and store it
> on our servers for evidence retention and further analysis.
>
> So, with that bird's eye view architectural overview in mind, I need
> to work out several technical items:
>
> a. The component of your software running on our systems needs
> what specific environment? Is it windows based? Randy, when we met
> with him last week, believed your software runs on Linux, but was not
> sure. Neither system is a problem for us to deal with.
> b. The data being transferred between your system and ours:
> what format will that be in? Will your software be able to
> automatically deal with it or will there be some form of intervention
> required on our end either in the form of a manual load or scripted
> automation?
> c. What will be the most effective way to transfer the data
> from your system to ours? XML-RPC, SOAP, some other method in a batch
> type file, or database dump?
>
> 2. We will be ordering equipment to handle this portion of the data
> collection. Based on your software's requirements, as well as your
> knowledge and experience in working with collected digital media
> (obviously not directly in this specific case, but similar nonetheless),
> what would you recommend in terms of storage, processing power, and
> memory? I want to make sure that we don't start and find the system we
> get is underpowered.
>
> I believe that at this point, that covers the basic topics that need to
> be addressed. You can either respond via email, or if you prefer, we
> can arrange a conference call to cover these points as well.
>
> Thank you for your time.
>
> Brad Bartram
>
>
>
> Bradley J. Bartram
> Intelligence Analyst
> New York State Office of the Attorney General Statler Towers
> 107 Delaware Avenue, Room 4-130
> Buffalo, New York 14202
> vm 716-853-8542
> cell 716-783-1215
> e-mail: bradley.bartram_at_oag.state.ny.us
>
>
>
>
>
>
>
>
Received on Fri Sep 14 2007 - 10:56:03 BST

This archive was generated by hypermail 2.2.0 : Sun Sep 16 2007 - 22:19:48 BST