An Intranet Search “Like Google". What is Possible Today, Where are the Opportunities and Limits?
- by Alwin Lösche
- Artificial Intelligence
When employees are asked what makes a good intranet, what is missing in their current solution, and what could be improved upon, there is usually an overarching theme to the answers given. Many reply along the lines of a “very well-functioning search engine”, often with the emphasis on something “like Google”. But what exactly does “like Google” mean and what is possible/useful today as part of an internal intranet search? Essentially, various expectations are linked to a search function, compared with the international search giant, or certain functions are assumed as a standard. The following 3 aspects offer a consideration around the comparison between Google and the current enterprise search technologies:
Completeness and Simplicity
Google search gives a sense of completeness. Whether this is due to the extremely high market share on the Internet and the associated mindset of “what is not on Google does not exist”, or the actual size of the search index, remains to be seen. The fact is, it gives users the impression of completeness and simplicity, so all they have to do is search in Google to find everything.
In stark contrast to this, intranet search engines are nearly always considered incomplete, largely due to the fact that not all data sources are connected and information has to be searched for in multiple places. This also applies to the new collaboration tools and social intranets, which often form another data silo in the company after the initial euphoria. Poor search result quality also increases uncertainty for the user who may ultimately question whether the content even exists. The aim of a modern intranet search should be to connect as many data pools as possible and make them searchable centrally in one place. The biggest hurdle in achieving this ideal typically lies with company-specific authorization concepts, which can be different within the source systems, but must be correctly interpreted by the search. Depending on the existing IT structure in the company, these adjustments can become cost drivers. For this reason, interfaces for the most common business applications are already included at narvika and over 100 additional interfaces are optimally prepared to enable quick and inexpensive implementation.
Search Result Quality & Personalization
The user immediately expects personally fitting, relevant content, regardless of how many hits there are for a search query. The search engine decides which content suits the individual query best and shows the hits in order of relevance. This ranking is characterized by complex factors (ranking factors) and can be optimized along these factors.
The most obvious ranking factor is the data quality of the content concerned (classic on-page). The search engine must therefore decide which content is “more valuable”, i.e. more relevant, within a certain context, etc. The best practice here is to forego redundant content and ensure the best possible keywording of the content. For Google, content is prepared and optimized in a very time-consuming and costly manner. This effort is usually neglected with intranet content and is too time-consuming to catch up with existing content. This regularly leads to a decrease in search engine quality when more and more content is added and the search engine experiences increasing issues identifying relevant content. Current search technologies can classify existing data using machine learning and, for example, use them for automatic keywording.
External Factors (classic off-page)
Details about the algorithm at Google are kept under tight wraps and are repeatedly (and controversially) discussed in the SEO scene. Among other factors — high-quality, external links (backlinks) to relevant content and mentions (social signals), as well as the actual search entries, are known as key contributors. Based on these factors, an attempt is made to classify the relevance of websites both in general and specifically for certain topics. These possibilities and assessments are almost completely eliminated with the usual intranet search engines offerings.
How satisfied is a user with the search result or the content of a particular search? Also discussed controversially in detail in the professional world, one can assume that Google can analyze user satisfaction with the displayed hit content and draw conclusions about the relevance of the content to the search query.
In the area of intranet searches, simple click counters can usually be implemented, which will index content accordingly based on what is frequently clicked. However, most instances tend to be much more complex and simple click counters can falsely reinforce themselves, since no real feedback on the target content is processed. Common web analysis metrics are to be viewed with reservation in search engines. This fact can be intensified by different search patterns. At Google, the "first pattern search" is predominantly relevant, i.e. finding the first available content. Whereas searches for "known items" are more frequent on the intranet, i.e. searching for specific and known content that can ideally be reached immediately. Little interaction with the search application can therefore be a very good and desirable approach (allowing the user to find what they are looking for immediately) or a potentially negative and less effective approach (where the user does not find their way around at all).
Search queries are still strongly "keyword based", so that the search engine usually only receives a few keywords without any further context. Example "Bark" can refer to either the bark of a dog or the bark of a tree, the search query is identical, but the context is significantly different. It is undeniable that Google has so much user data (through search queries themselves, but also through the spread of the Google Chrome browser and the Android operating system) that the individual search intent can be automatically estimated in a much larger context. Results can be personalized based on the most likely search intent.
An internal intranet search is much more difficult in this instance, as only a fraction of user behavior can and may be recorded and analyzed. However, current enterprise search engines use artificial intelligence to select the context as a subsequent step in the search process. In the example "Bark", the user is optically shown that there are hits in the domestic animal area as well as in the garden accessories area. The search intent is not actively analyzed, but the user can refine the search in the desired subject area in a targeted manner.
User Behavior and Artificial Intelligence
Essentially there is no comparison here, since the intranet search naturally has a limited amount of data, especially when it comes to analyzing user behavior. Data protection plays a central role here and data collection on user behavior is severely restricted, which in turn reduces the basis for the possibilities of machine learning. On the other hand, an intranet search (also with AI) can be optimized and trained much more specifically for the company content, so that many of the weaknesses that emerge in a direct comparison can be efficiently compensated for.
Search Management & Content Strategy
The great opportunity that comes with intranet searches compared to Google optimization is that the details of how users interact on (and with) Google is not transparent enough for the content creator of the website. Therefore this interaction can only be estimated to a very limited extent via the content creator’s own web analysis, as well as indirectly via service providers and used with blurring for their own content strategies or optimizations.
The situation is quite different with the intranet search and your own content, where practically all data and options are available to you. Using search evaluations, all ranking factors, and filters means that content can be precisely controlled and optimized. Insights into the most searched keywords and search queries without any hits form the basis for initial optimizations.
Timeliness & Speed
Search results are never up to date in real time, not even on Google. Search hits always relate to a cached status of a piece of content and must be regularly updated by the search engine. The crawler goes through all content at regular intervals and updates the saved status if necessary. This (index) process includes further steps, including a full text analysis, keywording, but also the generation of preview images etc. Understandably, this costs a lot in both time and resources. Therefore, the index intervals can be very different and regularly correlate with the visibility of a website on Google, so that relevant websites or certain areas of a website are updated more frequently than others.
In terms of management and execution of website crawling, Google is beyond comparison, but in the intranet, in addition to websites, many data sources can be connected directly via technical interfaces and APIs, which simplifies and significantly accelerates these processes. For example: The crawling of 5,000 subpages of a website can take several hours, while the same number of database entries can be updated within minutes or even seconds.
Search results immediately! A search function works best when you don't notice it at all. An even less visible feature is the performance of the search function, which directly affects the loading time of the search hits. As a clear success factor, Google monitors and optimizes loading times in the millisecond range. Users who are (urgently) looking for certain information have an extremely low tolerance for sluggish hit lists and redundant "clicking around". Whether you get hits in 2 seconds or in half a second is a noticeable difference and has a lasting effect on user satisfaction.
With undisputed market power and a clear focus on web search, Google search cannot be compared in many ways with a current intranet search. In particular, the interaction of the world's best technology and the most extensive user data enables profound optimizations in order to continuously improve a perfect user experience.
The good news, however, is that many of the weaknesses that emerge in a direct comparison can be compensated very well with a current intranet search and in compliance with data protection regulations. A comprehensive intranet search can find all content, not just websites, but also the 3-fold packed Excel table stored in the last corner on the server, all your emails and your last post on the social intranet. In addition, you have the entire technology and all factors "in your own hands'', so that many settings within the search engine can be precisely adapted to the company. You can decide for yourself which content is relevant and thus analyze and improve the quality of the search results yourself. Using machine learning methods, very large amounts of data can be classified and expanded to include a context-related search beyond specific keywords.
Even if Google cannot be compared with an intranet search, the underlying desire to make internal content searchable “like Google” is very realistic to implement. How and at what price one can meet this requirement for completeness and quality depends heavily on the company's existing IT applications. With narvika you can start at a fixed price, which even includes the most popular interfaces, and add additional source systems to the search function if required.