Only 21 of the enterprises who offered AI network comments were doing any AI self-hosting, but all who did and almost all of those who were seriously evaluating self-hosting said that AI hosting meant a specialized cluster of computers with GPUs, and that this cluster would have to be connected both within itself and to the main points of storage for their core business data. They all saw this as a whole new networking challenge.
Every enterprise who self-hosted AI told me the mission demanded more bandwidth to support “horizontal” traffic than their normal applications, more than their current data center needed to support. Ten of the group said that this meant they’d need the “cluster” of AI servers to have faster Ethernet connections and higher-capacity switches. Everyone agreed that a real production deployment of on-premises AI would need new network devices, and fifteen said they bought new switches even for their large-scale trials.
The biggest problem with the data center network I heard from those with experience is that they believed they built up more of an AI cluster than they needed. Running a popular LLM, they said, requires hundreds of GPUs and servers, but small language models can run on a single system, and a third of current self-hosting enterprises said they believed it is best to start small, with small models, and build up only when you had experience and could demonstrate a need. This same group also pointed out that control was needed to ensure only truly useful AI applications where run. “Applications otherwise build up, exceed, and then increase, the size of the AI cluster,” said users.
Every current AI self-hosting user said that it was important to keep AI horizontal traffic off their primary data center network because of its potential congestion impact on other applications. Horizontal traffic from hosted generative AI can be enormous and unpredictable; one enterprise said that their cluster could generate as much horizontal traffic as their whole data center, but in bursts rarely lasting more than a minute. They also said that latency in this horizontal burst could hamper application value significantly, stretching out both the result delivery and the length of the burst. They said that analyzing AI cluster flows was critical in picking the right cluster network hardware, and that they found they “knew nothing” about AI network needs until they ran trials and tests.
The data relationship between the AI cluster and enterprise core data repositories is complicated, and its this relationship that determines how much the AI cluster impacts the rest of the data center. The challenge here is that both the application(s) being supported and the manner of implementation have a major impact on how data moves from data center repositories to AI.
AI/ML applications of very limited scope, such as the use of AI/ML in operations analysis in IT or networking, or in security, are real-time and require access to real-time data, but this is usually low-volume telemetry and users report it has little impact. Generative AI applications targeting business analytics need broad access to core business data, but often need primarily historical summaries rather than full transactional detail, which means it’s often possible to keep this condensed source data as a copy within the AI cluster.