Here is some more info:
Source/ Data Creation
We retrieve data from a variety of sources. Primarily we extract data out of emails that we have popped or imap's from various ISPs. This data consists of things like the subject line, date the email was sent, the message identification number, and whether the email was in the spam folder or inbox. This process is stable and unlikely to change greatly. However, given that this is the data that feeds the rest of this process, we do want to ensure the extraction process is efficent and where and how this data is stored is defined to enhance the rest of this process.
INITITAL & SUBSEQUENT Data Storage
We want to store this data in the most ideal fashion. right now all data is stored in relationable SQL tables. however, this may or may not be efficent to support real-time information retrieval which is one of our goals. Thus, do we have a system where the INITIAL data storage is SQL tables and then the data is moved to a SUBSEQUENT storage method for real-time information exchange?
Client Real Time Data Retrieval / Push System
We need to build a SECURE, EXTENSIBLE, SCALABLE, and ROBUST system that allows our MANY Clients to Access the DATA in Real-time over the internet.
Secure - we want to ensure a client cannot access another clients data. we also want to ensure non-clients can't get to it.
Extensible - we want the platform & infrastructure to be extensible to other types of data. Initially we would be providing data extracted from emails ..in the future it may be different types of data whose SOURCE is different and potentially their INTIAL data storage. Ideally the framework for communication (e.g. how we transfer the info, etc.) will be "extensible" so that we can use this same framework for new applications.
Scalable - its essential that all efficiencies to manage peeks in load be considered. Additionally, is it better to PUSH data to clients or allow them to PULL data from us or provide options for BOTH.
Simple - we want to minimize the back & forth between client and us. Each transfer of data should be independent and not reliant on us successfully storing informaiton regardng a prior request. How & when & where do we use cacheing.
Robust - we want to be ablee to provide simple to more complex data. we want to be flexible in the options we provide for accessing the data.
Different Applications of Data on client side - clients may use the data to integrate into other processing applications (e.g. deployment engines), clients may want to create an web-based report with the data (in which case, do we transfer style sheets), clients may want to insert this data into their database tables. Does this change anything about how we send it or what else we send.
Reliable - its critical that downtime be limited to minutes - how do we build a real time back up system that is used as a failover in case the primary goes down.
Hardware/Software
what is needed to build and support all of above.