OPTIMISATION IS AT THE HEART OF EVERYTHING WE DO AT CODEZERO, WHETHER IT IS COMPUTE, STORAGE, NETWORKING, DATA ANALYSIS OR DATA SCIENCE, OUR PRINCIPLES REMAIN THE SAME.
10 years ago we started to experience virtualisation sprawl, we are now in the era of Data sprawl. We are firm believers that if we are not careful, Data will become the new plastic.
Businesses are collecting and storing more data than ever before and with the rise of Data Lakes, Data Warehouses & Data Analytics, businesses are being told that the more data you have, the better, after all this is the only way you can achieve results through AI, ML & HPC. This has resulted in businesses forgetting good housekeeping practices such as ROT (Redundant, Obsolete and Trivial) and thinking if they store Data in an object based cloud repository, they will be able to store that data for a low cost. The issue with this approach in public cloud is that you are charged for putting Data into the repo (ingress), you are charged when you pull Data out (egress) and you are charged when you access the Data (API charges).
The same can also apply when you consider on premise storage, it's likely that customers are relying on native dedupe and compression capabilities of their SAN and NAS deployments, however deduping and compressing redundant data carries an expensive overhead. If we target only meaningful Data, we can significantly reduce the Data overhead and ensure that businesses are only ever paying for what they need and have a good handle on the most efficient ways to collect and store data.
It is easy to get caught up with the hype cycle and purchasing the latest and greatest servers, switches, storage appliances and GPUs for AI, ML & HPC but the reality is that if the Dataset is correctly defined, we don’t need as much resource and power to run the models and computations required to achieve the business goal.
Thanks to our Software Engineering and Development capabilities, it’s not just the Architecture and Data Science that we can optimise, we can work with codebases and scripts to make sure we make the most efficient use of the resources available to us. We can use enterprise software such as Application Performance Monitoring and Observability platforms to help us optimise performance and resources but these come with a hefty price tag and so where applicable, we prefer to use open source software such as Prometheus (Time Series Database) & Grafana Labs (Data Visualisation and Dashboards).
We can use both of these technologies to capture all of the performance and utilisation statistics and combine with our own proprietary software to provide ‘No Code’ Data Visualisation. A numer of businesses invest large sums of money into tools such as Tableau
Software and PowerBI as they deem them to be Data Analytics tools but they are tools for Data Analysts which means they carry an additional overhead to the people's cost, which is already considerable.
Project Demeter showcases CodeZero running Kaggle based Datasets in AWS and on Zadara hardware, leveraging NVIDIA T4, A2 & A40 GPUs. We are able to present resource utilisation statistics to show customers just how much compute and storage resource is actually needed to train Machine Learning models.
Our ‘No Code’ Data Science software (Wormwood) effectively automates 90% of the menial, daily repetitive tasks that Data Scientist conduct, this represents a significant cost and time saving which can be injected back into innovation and moving ahead of the competition.
OUR PARTNERS: