Previously I discussed the Plexi Adaptor framework for the Google Search Appliance. Adaptors can provide a simple and elegant way to index a content repository. An Adaptor sits in front of a repository, making it behave like a web site from the GSA’s perspective. This off-loads the state-management and queue-management to the GSA’s built-in web crawler, simplifying the implementation.
But I mentioned that I was leery of Adaptors. I have written dozens of connectors and I like the control that the Connector Manager framework affords. I have written Connectors that do not seem transferable to the Adaptor methodology. I would like to discuss a few reasons that I am not completely letting go of Connectors.
Change Detection
When I implement connectors with complex hierarchical ACL’s or that require joining multiple database tables, I often have to track the state of multiple objects to do change detection. For example, the ACLs in our Atlassian JIRA Connector take into consideration various objects, including the Project, Permission Schemes, Issue Schemes and Custom Attributes, to compute the ACL for a single Issue. Changes could occur in any of these objects, with ripple effects throughout the entire repository. Implementing stateless change-detection with a Retriever would be very difficult because these permission objects, and the complex interactions between them, do not have timestamps to reveal modifications. Instead, we store a snapshot of the permissions for each object in our Connector, and that allows us to quickly check for even subtle changes to the permissions. There is nothing that prevents an Adaptor from storing this kind of state information, but it goes against the recommended design.
Sequential Iteration
Read More