Showing posts with label indexing-connector. Show all posts
Showing posts with label indexing-connector. Show all posts

Monday, October 15, 2012

PowerShell HowTo: Creating managed properties and adding crawled properties

In the previous posts I showed how you deploy a Custom Indexing Connector (or Search Connector) to SharePoint and how you create a content source that makes use of it.

In this post I will give a quick introduction to Crawled Properties and explain how to create a new Managed Property using PowerShell as well as how to add crawled properties to it.

Where do crawled properties come from?

I won't go into the definition of Managed Properties and Crawled Properties, you can bing that for yourself. But I show you where crawled properties come from (with a custom indexing connector in mind).

Why do I want to know?

You need crawled and managed properties to improve the search experience for your users.

After setting up the indexing connector you need to start a crawl on your external system (if you didn't already do it: start it now and check the Crawl Log for success). The connector will index content it finds (so your users can search for it) and it will create Crawled Properties. These come from your BDC.

So let's keep in mind that we want to improve the search experience. And this can be done by creating Search Refiners centered around BDC entities. And we ultimately need those crawled properties to do this.

It's in the BDC

Let's assume you are crawling a financial LOB system and your BDC model file contains an entity Order which has the following structure:
<TypeDescriptor Name="Order" TypeName="PurchasingConnector.Entities.Order, PurchasingConnector, Version=1.0.0.0, Culture=neutral, PublicKeyToken=0000000000000000">
  <TypeDescriptors>
    <TypeDescriptor Name="OrderNo" TypeName="System.String" IdentifierEntityNamespace="Purchasing" IdentifierEntityName="Order" IdentifierName="OrderNo" />
    <TypeDescriptor Name="Customer" TypeName="System.String" />
    <TypeDescriptor Name="OrderDate" TypeName="System.DateTime" />
  </TypeDescriptors>
</TypeDescriptor>
After doing the first full crawl on your external system your list of crawled properties should have been expanded. Have a look - go to Central Administration -> Search Service Application -> Queries and Results -> Metadata Properties:

Search Service Application - Metadata Properties

In the top menu click Crawled Properties:

This will list all crawled properties including these from your BDC:
  • Order.OrderNo(Text)
  • Order.Customer(Text)
  • Order.OrderDate(Date and Time)
(You probably have to search a bit as they hide among the masses of OOB crawled properties.)

Refiners the way we want them

Ultimately, we want to create a refiner allowing us to filter for the customer of an order.
Search Refiner

To do this we need a managed property (as the tutorials on creating search refiners will tell you). And that is exactly what we're going to created now using PowerShell: a managed property.

Time to create Managed Properties and add Crawled Properties

To stick with the above example we create a managed property named Customer:
$ssaGlobal = Get-SPEnterpriseSearchServiceApplication
$schemaGlobal = New-Object -TypeName Microsoft.Office.Server.Search.Administration.Schema -ArgumentList $ssaGlobal

# check if property already exists - in this case we cancel the operation
$property = GetManagedProperty -ssa $ssaGlobal -schema $schemaGlobal -managedPropertyName "Customer"
if ($property)
{
    Write-Host -f Red "Cannot create managed property because it already exists"
 exit
}

# create managed property with name "Customer" of type "Text"
$property = $schemaGlobal.AllManagedProperties.Create("Customer", "Text")
# set description; there are other properties you could set here
$property.Description = "Customer of Order (Managed Property created from PowerShell)"
That's basically it. But the managed property is still empty. We need to add a mapping for our crawled property. Here is how:
# this is the "Property Set GUID"; it is also used by the custom indexing connector so this is where you need to get it from
$propSetGuidGlobal = "{00000000-0815-0000-0000-000000000000}"
$textVariantType = 31
$mappings = $property.GetMappings();

# try to map crawled property - if the crawled properties doesn't exist nothing bad happens, it will simply be ignored
$mapping = New-Object -TypeName Microsoft.Office.Server.Search.Administration.Mapping -ArgumentList $propSetGuidGlobal, "Order.Customer", $textVariantType, $property.PID
$mappings.Add($mapping)
 
$property.SetMappings($mappings)
$property.Update()
Note that you need to specify the Property Set GUID for the Property Set the crawled property is contained in. For the OOB crawled properties in SharePoint these are documented (somewhere). For a custom indexing connector this ID is defined inside the connector. So this information should probably be contained in the documentation for ease of use.

Also note the type of the crawled property, 31, which means Text. This and more variant type identifiers are listed in this blog post.

Now go and crawl!

After creating or modifying managed properties you have to do a full crawl. Otherwise your managed property won't work as expected.

Thursday, October 11, 2012

SharePoint 2010 SP1 changes the account used for indexing content. Not.

After updating SharePoint 2010 from RTM state (14.0.4763.1000) to Service Pack 1 (SP1, 14.0.6029.1000) a content source of the Search Service Application (SSA) suddently stopped indexing content.

The content source in question was of type CustomRepository and used a custom indexing connector to access an external system. It downloaded data via web service. And this suddenly seemed to fail.

What was going on?

Somebody cannot access something

A look into the ULS log revealed errors which happened every time the connector tried to access the web service. The crawl history showed a single top level error and the crawl log had the following entry:

"Error while crawling LOB contents. ( Credentials were not found for the current user within the target application '...'. please set the credentials for the current user. )"
The error message pointed into one direction: the Secure Store. All credentials for accessing the external web service were saved in the secure store. And one account was allowed to get these credentials. The message was suggesting that another account than the allowed one was trying to get the credentials.

But what is the "current" user? Shouldn't the user be the Default Content Access Account of the SSA as configured in the Crawl Rules?

Identity crisis

After looking into the task manager I decided to give credential access to one account: the account mssdmn.exe runs under, which is the account of the SharePoint Server Search 14 service.

And it seemed like

  • before updating to SP1 the Default Content Access Account (as configured in the Crawl Rules) was used to access the secure store credentials
  • after updating to SP1 this account apparently changed to the account of the SharePoint Server Search 14 service.

So the solution was simple, yet mysterious: I changed the account allowed to access the web service credentials. And it worked.

But stop!

Resolution? Confusion.

After a few days I deleted the content source previously affected and added it again. And the indexing stopped again. Same error as before: "Credentials were not found for the current user within the target application '...'. please set the credentials for the current user." What was going on this time?

The account used by Search to access the secure store credentials changed again. To what was set prior to installing SP1: the Default Content Access Account of the SSA. As one would expect.

Strange.

Monday, October 8, 2012

PowerShell HowTo: Deploying a Custom Indexing Connector to SharePoint - Part 2

Deploying a custom indexing connector in SharePoint requires two steps:
In an earlier post we already covered step one and added a protocol to SharePoint. Now it's time to tell SharePoint about our indexing connector. The connector and the protocol will be associated together with the Business Data Catalog (BDC) model file.

Register the indexing connector with SharePoint

The cmdlet used to register the connector is New-SPEnterpriseSearchCrawlCustomConnector which takes our previously registered protocol as parameter as well as the path to the BDC model file. The model file describes the structure of the external system's data. It also contains information about where to find our indexing connector. The connector will be used to handle URLs starting with the given protocol.
     
Function RegisterCustomConnector
{
    param ([Microsoft.Office.Server.Search.Administration.SearchServiceApplication] $ssa, [string] $protocol, [string] $displayName, [string] $modelFilePath) 
    Write-Host "Registering custom connector: $displayName"
    $connector = New-SPEnterpriseSearchCrawlCustomConnector -SearchApplication $ssa -protocol $protocol -ModelFilePath $modelFilePath -Name $displayName
    if ($connector)
    {
      Write-Host -f Green "Successfully registered custom connector: $displayName"
    } else
    {
      throw "Registering custom connector failed: $displayName"
    }
}

$ssa = Get-SPEnterpriseSearchServiceApplication
RegisterCustomConnector -ssa $ssa -protocol "protocol1" -displayName "Connector" -modelFilePath "MyBDC.xml"
The documentation states that the protocol must have the format "protocol1://", but using just "protocol1" (without "colon dash dash") works just fine.

Example of a BDC model file where you can see the assembly and classes specified:
     
<Model name="MyModel" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.microsoft.com/windows/2007/BusinessDataCatalog">
  <LobSystems>
    <LobSystem name="ContosoSystem" type="Custom">
      <Properties>
        <Property name="SystemUtilityTypeName" type="System.String">ConnectorNamespace.ContosoSystemUtility, ConnectorAssembly, Version=1.0.0.0, Culture=neutral, PublicKeyToken=0000000000000000</Property>
        <Property name="InputUriProcessor" type="System.String">ConnectorNamespace.ContosoLobUri, ConnectorAssembly, Version=1.0.0.0, Culture=neutral, PublicKeyToken=0000000000000000</Property>
        <Property name="OutputUriProcessor" type="System.String">ConnectorNamespace.ContosoNamingContainer, ConnectorAssembly, Version=1.0.0.0, Culture=neutral, PublicKeyToken=0000000000000000</Property>
      </Properties>
      <!-- More content here -->
    </LobSystem>
  <!-- and here -->
  </LobSystems>
<!-- and here -->
</Model>
The assembly here is "ConnectorAssembly" which you have to deploy to the Global Assembly Cache (GAC).

(More about attributes used in the BDC model file can be found in the MSDN: "Search Properties for BDC model files".)

Wednesday, October 3, 2012

PowerShell HowTo: Deploying a Custom Indexing Connector to SharePoint - Part 1

Deploying a custom indexing connector in SharePoint requires two steps:
  • Adding a protocol used by SharePoint to call into the indexing connector
  • Registering the indexing connector with SharePoint
In this post I will show how to accomplish the first step via PowerShell.

Both steps are also described in the MSDN but we go one step further and automate it completely using PowerShell.

Adding protocol and handler to the registry

Based on the protocol of a content source's start address (e.g. http or bdc3) SharePoint decides which indexing connector should crawl it. The protocols known to SharePoint are stored in the Registry on the server where the crawling will take place.

So, the protocol used by our indexing connector also needs to be added to the registry. The following script adds the protocol protocol1:
        
Function RegisterProtocolHandler
{
    param ([string] $protocol)
    
    $path = "HKLM:\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\ProtocolHandlers\"
    Write-Host "Adding protocol handler to registry: $protocol" 
    # creates the property if not present, otherwise updates it
    Set-ItemProperty -path $path -name $protocol -value "OSearch14.ConnectorProtocolHandler.1" -ErrorAction Stop
    Write-Host -f Green "Successfully added protocol handler to registry: $protocol"
} 

RegisterProtocolHandler -protocol "protocol1" 
If you look at the registry afterwards you will see your protocol among the already registered ones:
HKLM:\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\ProtocolHandlers
But where to get the name of the protocol from in the first place? It has to be included in the documentation of the indexing connector you want to deploy. This is decided by the creator of the indexing connector and could basically be anything like protocol1, abc or helloworld.

Sunday, September 30, 2012

PowerShell HowTo: Create Content Source of Type CustomRepository


In this post I explain how to create a SharePoint content source of type CustomRepository via PowerShell. There are some pitfalls which I will highlight.

CustomRepository

Content sources of type CustomRepository are used to index external data sources which aren't supported by any of the built-in indexing connectors. These content sources are to be used in conjunction with Custom Indexing Connectors, which use a custom protocol to access the external system.

The property page of an already created content source of type CustomRepository looks like in Figure 1.


Edit Content Source of Type CustomRepository
Figure 1: Content source; properties relating to type CustomRepository are highlighted

You can see that a custom connector named Custom Protocol with scheme protocol1 is used to access the external system. How to register the custom connector will be topic of another blog post.

Scripting

You can use PowerShell to script the creation of this type of content source. It is pretty straightforward if you know about the pitfalls.

Use the New-SPEnterpriseSearchCrawlContentSource cmdlet. Here is a working example:
        
  $contentSource = New-SPEnterpriseSearchCrawlContentSource -SearchApplication $ssa -Type CustomRepository -Name "My Content Source" -CustomProtocol "protocol1" -StartAddresses "protocol1://localhost"

Pitfalls

The StartAddresses parameter is not really optional

Don't forget to specify the StartAddresses parameter! The cmdlet's documentation states that this parameter is optional. Well, this is true in the sense of skipping this parameter won't stop the content source from being created. But it will be broken:

Error: The custom connector used by the content source has been removed or undeployed.
Figure 2: "The custom connector used by the content source has been removed or undeployed."
The error message hints that the custom protocol used by the indexing connector is not available anymore. This is misleading. The correct error message would be "There are no start addresses provided and I am freaking out about it for no reason.".

There is also no way to correct this. The radio box next to Custom Protocol cannot be clicked and the OK button of the dialog is disabled. And won't be enabled again. The only way to fix the content source is to delete it.

So remember to provide a start address when creating a content source via cmdlet. 

Don't use start addresses with empty host part

Using a start address like "protocol1://" (with empty host part) can also lead to the above error. I say "can" because this sometimes seemed to work, sometime not.

So to be sure you should always specify a host in your start address.