A global energy company had fallen on some hard technical times in one of their divisions. A number of their developers had moved on, leaving behind a list of issues that needed attention, a flood of new client activity knocking on their door with new markets coming onboard, and long-standing clients migrating from a legacy platform. A consultant we partnered with previously reached out to us to come on board and help out.
Some of the issues they were looking for us to address were:
- Expand current functional capabilities and new feature development
- Major and minor adjustments to their API
- Performance problems with some of their critical endpoints
- Neo4j configuration problems
- Issues with their code architecture that were negatively impacting the items above
As we were onboarding, it became apparent that deep technical knowledge in the group was missing, so we had to roll up our sleeves to understand the tech stack and get up to speed quickly to help make some much-needed changes and improvements. Some of the items we had to tackle without direction were:
- Understanding the current codebase
- Building a local sandbox
- Deployments to their test environments and production
- Database configuration
Neo4j Clustering and Transactions
While getting up to speed, we noticed that they were running Neo4j on multiple instances but not appropriately clustered, so only one instance responded to requests. The scale of requests running through the API and database were problematic without proper clustering in place. We fixed the clustering configuration, separating leader and follower Neo4j nodes, so all instances were utilized in responding to requests.
There was no or very limited transaction management implemented, risking integrity of the graph database. For example, if an endpoint needed to update multiple relationships across multiple nodes, but failed in the middle, there was no rollback process. We fully implemented transaction management to deal with those concerns.
While tackling early issues such as clustering and transaction management, we ramped up on domain knowledge: learning more about their data collection processes, internal policies and procedures, and a lot on their data model.
After settling in, one of the next assignments was improving performance. We began by exploring their logs in Splunk to determine calls to the API most often utilized, ranked by the slowest average response time and max response time outliers. With that information, we could inventory and rank the endpoints that needed improvement.
Following architecture, code, and database changes, there were some dramatic improvements:
- Average response time decreased from ~70ms to ~4ms for a number of endpoints called most often
- An average response time outlier was decreased from ~750ms to ~30ms
- A problematic set of endpoints, that had been worked on by previous teams without successful improvement were decreased from 3-10 seconds to under ~75ms
For endpoints averaging 10s of millions of calls per month, these decreases in response time are critical.
Fast Tracking a Project
Following the performance improvement project, the client quickly diverted us to help on a project that needed to be fast tracked. The business had contracts to enter new markets and realized a lot of development changes were needed to meet the new requirements. Following early design meetings and planning sessions, the timeline for delivery looked like 3+ months, but we worked closely with the product teams to focus efforts and get our delivery time to just over one month. We did that by:
- Working closely with product teams to understand requirements and find answers to questions earlier in the process
- Plan workloads focused on only the features required for the project and backlogging nonessential work to be revisited after
- Limit meetings to only those necessary for making progress against the fast tracked project milestones
- Increase communication across teams related to feature handoff so that teams would start immediately after a feature was ready for them to take control
Other Enhancements and Improvements
After the fast track project, we took on a multitude of other challenges summarized below:
- Leveraged the Repository/Service design pattern for consolidating business logic and code used for interacting with data in an organized manner
- Code optimization to limit brute force logic for reducing large amounts of data that negatively impacted json throughput
- Implement bulk upload functionality to enhance the onboarding of new markets and data migration from other internal systems
- Data model clean up for clarity, efficiency, and maintenance
- Ongoing Neo4J maintenance
A premier global energy company needed some help. They had lost some developers, a backlog of feature work to open up new markets, bugs requiring fixes, and performance issues. OBLSK jumped in successfully, taking on all of the work initially scoped and then more as the relationship blossomed. Not only make huge improvements in the areas listed above, we also cleaned up large pieces of the codebase, allowing for faster feature development moving forward while remaining highly maintainable.