RE: Technical Specifications

From: Michael McCartney <michael.mccartney_at_oag.state.ny.us>
Date: Thu, 09 Aug 2007 16:34:58 -0400

Jay: Attached is a keyword list for the crawler. I was wondering if you still had the Suffolk County Keyword list available? It might have some addition terms and words that we could add to ours. Not sure how effective their list was or how ours will be for that matter. As you know, Keyword searching is not very effective. Of course our list may also have to be "wildcard" a bit to make better use of some of the terms and to minimize erroneous hits as best as possible. As far as the Hash library, after much digging, it appears that the only format we have this DB in is MD5 not SHA1, so I do not think it will be of much use on the P2P side.

Let us know when you are ready to start testing. We are ready to start testing immediately. Just let us know when you are ready on your end.

Talk to you soon.

Michael McCartney

Michael G. McCartney
Sr. Special Investigator
New York State Office of the Attorney General
Statler Towers
107 Delaware Avenue, Room 4-130
Buffalo, New York 14202
vm 716-853-8539
cell 716-983-4635
e-mail: michael.mccartney_at_oag.state.ny.us

>>> "Jay Mairs" <jay_at_mediadefender.com> 08/03/07 7:20 PM >>>
Brad,

Once the software is set up on your computer, it will be automatic.

As far as the timeline goes, I'm hoping to have a testable version of
the software within the next 2 weeks. By testable, I mean that the
networking functionality is complete, so that the operation of the
system can be tested.

The development of the filtering logic will be an iterative process that
will begin once we get a draft of the keyword list. Depending on what
kind of results we are seeing from the keywords you give us, we may have
to tweak our filtering to maximize the quality of the results. I won't
have a concrete timeline on that until we start seeing results from the
keyword list you send.

Regards,

Jay Mairs
MediaDefender, Inc.

-----Original Message-----
From: Bradley Bartram [mailto:Bradley.Bartram_at_oag.state.ny.us]
Sent: Friday, August 03, 2007 1:19 PM
To: Jay Mairs; Jay Mairs
Cc: Ben Grodsky; James Domres; Michael McCartney
Subject: RE: Technical Specifications

Jay;

Thanks for that clarification.

With regards to number 2, Mike McCartney (who is copied on these emails)
is still compiling that list for you.

Do you have any estimation on time frame for testing this application?
Also, when we put the "collector", for lack of a better term, on our
server, will it be an automatic process for the two parts of the
software to communicate, or will there be some user intervention here?

Thanks

Brad Bartram

>>> "Jay Mairs" <jay_at_mediadefender.com> 08/03/07 4:00 PM >>>
Brad,

1. Our development on the data collection software is to the point where
we are ready to do filtering based on a list of keywords. Do you have a
list of appropriate keywords for us to use in our filtering?

2. I've been talking to the developers in more depth about the
communication between our server and the software running on your
computer. In addition to the encrypted SQL queries I mentioned in the
last message, there is an http connection made to our server. The data
transferred over this connection is encrypted.

Regards,

Jay Mairs
MediaDefender, Inc.

-----Original Message-----
From: Jay Mairs
Sent: Wednesday, August 01, 2007 2:40 PM
To: Bradley Bartram
Cc: Ben Grodsky; James Domres; Michael McCartney
Subject: RE: Technical Specifications

Brad,

The remote queries will be encrypted using SSL connections. We use
MySQL, which is able to use SSL connections.

Jay Mairs
MediaDefender, Inc.

-----Original Message-----
From: Bradley Bartram [mailto:Bradley.Bartram_at_oag.state.ny.us]
Sent: Wednesday, August 01, 2007 11:39 AM
To: Jay Mairs; Jay Mairs
Cc: Ben Grodsky; James Domres; Michael McCartney
Subject: RE: Technical Specifications

Jay;

Thank you for your response.

Everything looks pretty good in terms of architecture and such. The
thing that I'd like to discuss at this point is how we can securely make
the database calls between our respective systems.

With that in mind, what would you recommend for this secure
communication? Is you application web based, where we could make
queries over an ssl encrypted https tunnel, or is it a client-server
design where we need to begin thinking about setting up a dedicated VPN?

Thanks

Brad Bartram

>>> "Jay Mairs" <jay_at_mediadefender.com> 07/31/07 6:57 PM >>>
Brad,

I'll just follow your numbering format for my reply:

1.Your architectural overview is correct, except that we would have our
data collection system writing to a database which the downloading
software would make remote queries to.

        a.The software runs on linux, specifically CentOS 5.
        b.The downloading computer(your computer) will query a DB on our
server for the list of files to download. This will be done
automatically.
        c.Our system currently uses remote SQL queries to get the list
of files to download.

2. The hardware we are currently running on has the following general
specs:
        Dual or Quad core x86 processor, 2-3 GHz
        2-4 GB of RAM
        ~1TB of HD storage(this could be more or less depending on your
data retention requirements)
        Possibly RAID, depending on your requirements

If you have any other questions, please let me know.

Regards,

Jay Mairs
MediaDefender, Inc.

-----Original Message-----
From: Bradley Bartram [mailto:Bradley.Bartram_at_oag.state.ny.us]
Sent: Tuesday, July 31, 2007 6:23 AM
To: Jay Mairs
Cc: Ben Grodsky; James Domres; Michael McCartney
Subject: Technical Specifications

Jay,

I wanted to touch base with you regarding some things that we need to
clarify and setup before we are able to go "live" with this project.
I'll bullet-point them here, but if you have any questions, feel free to
call me directly at the contact information listed below.

1. Based on the design of how we are setting up this system to satisfy
the legal and evidentiary requirements, our understanding of the
architecture is as follows:

    - On your end, the peer-to-peer crawler will be identifying files
matching the established search criteria from various hosts. This data
will then be collected, filtered for New York resident ip addresses (to
the accuracy limits imposed by geo-query tech). The data will then be
transferred to us where;
    - On our end, a separate piece of software will use that data to
connect into the network and download the file from a host and store it
on our servers for evidence retention and further analysis.

    So, with that bird's eye view architectural overview in mind, I need
to work out several technical items:

        a. The component of your software running on our systems needs
what specific environment? Is it windows based? Randy, when we met
with him last week, believed your software runs on Linux, but was not
sure. Neither system is a problem for us to deal with.
        b. The data being transferred between your system and ours:
what format will that be in? Will your software be able to
automatically deal with it or will there be some form of intervention
required on our end either in the form of a manual load or scripted
automation?
        c. What will be the most effective way to transfer the data
from your system to ours? XML-RPC, SOAP, some other method in a batch
type file, or database dump?

2. We will be ordering equipment to handle this portion of the data
collection. Based on your software's requirements, as well as your
knowledge and experience in working with collected digital media
(obviously not directly in this specific case, but similar nonetheless),
what would you recommend in terms of storage, processing power, and
memory? I want to make sure that we don't start and find the system we
get is underpowered.

I believe that at this point, that covers the basic topics that need to
be addressed. You can either respond via email, or if you prefer, we
can arrange a conference call to cover these points as well.

Thank you for your time.

Brad Bartram

Bradley J. Bartram
Intelligence Analyst
New York State Office of the Attorney General
Statler Towers
107 Delaware Avenue, Room 4-130
Buffalo, New York 14202
vm 716-853-8542
cell 716-783-1215
e-mail: bradley.bartram_at_oag.state.ny.us
Received on Fri Sep 14 2007 - 10:55:55 BST

This archive was generated by hypermail 2.2.0 : Sun Sep 16 2007 - 22:19:46 BST