PlanetFlow

schema problems

CREATE TABLE hostnames (
hostname varchar(255),
ip int unsigned,
PRIMARY KEY (hostname)
) PACK_KEYS=1;

--
-- Table of flows tables
--
CREATE TABLE flow_tables (
table_name varchar(64), -- name of the table
src_ip int unsigned, -- IP source address
slice varchar(32), -- slice that sent these flows
start_time datetime, -- start time of the earliest flow in the table
end_time datetime, -- end time of the latest flow in the table
flows int unsigned, -- total number of flows in this table
packets bigint unsigned, -- total number of packets sent in all flows in this table

June 26, 2007

The database is horribly inefficient.  Looking into a new schema that will use the same data inputs and actually use relations within the database to link the data.  Currently, a table is created for each flow and then an entry is made into a main table which keeps track of all the flow tables.  In order to query this data, you have to run 2 queries: 1 to the main TOC table to find the name of the flow table, then another to query that table and get the flow data.  This is a perfect example of how NOT to use a relational database.  Looking into the most effecient storage model and nomalization for this system.  2NF should be fine due to the nature of the data.

PlanetFlow Update

PlanetFlow

Works in progress:

  • DB normalization/optimization
  • GUI update
  • Usability analysis
  • Kernal update
  • New data storage and retrieval
Syndicate content