Babak Mahmoudi's Blog

Localization for Persian Language…

اولین پست

leave a comment »

متن فارسی و English Text برای تست


Written by Babak Mahmoudi

July 17, 2017 at 9:47 am

Posted in Uncategorized

Shift in .Net handling of culture data

leave a comment »

There’s been a shift of handling culture data in .Net 4. This may have influences on attempts in fixing Persian culture.

The shift is simple, while prior to version 4, CLR insisted on using privately stored culture data in a “culture.nlp” binary resource, version 4 gave up and uses windows API to get data from operating system. This is why some complained about missing CultureTableRecord while using PersianCalendar.

Culture Data Prior to .Net 4

Following sequence shows how culture data are handled in prior to version 4:

  1. CultureInfo constructor calls CultureTableRecord.GetCultureTableRecord


  2. CultureTableRecord constructor uses CultureTable.Default.GetDataItemFromCultureName:image
  3. The Default property returns m_defaultInstance field which is initialized in class initializer:image
  4. Finally InitializeBaseInfoTablePointers loads “culture.nlp” from assembly resources:image
  5. where “culture.nlp” can be found in mscorlib resources:


There’s also a CalendarTable class with virtually same approach



CultureData in .Net 4

The above approach has been totally depreciated in .Net 4. Now CLR prefers to use windows API to retrieve culture data. For instance following excerpt is from disassembly of CalandarData code where CLR attempts to enumerate optional calendars by calling EnumCalendarInfoExEx:



There’s been a shift in handling culture data in .Net 4: Now CLR prefers to use windows API to get culture data This will have influences in fixing Persian culture as I will describe in later posts.

Written by Babak Mahmoudi

September 14, 2011 at 9:11 am

What’s wrong with Persian culture in .Net?

with one comment

This post was republished to Babak Mahmoudi’s Blog at 11:38:10 ق.ظ 08/22/2011


In this post, some mistakes in implementation of Persian culture in .Net are discussed and also get-around methods are proposed.


.Net provides enhanced globalization features mostly based on its implementation of Culture concepts. Programmers may use various aspects of these features to develop software ready for global market. A class called CultureInfo plays a key role in this implementation. It is mainly used to get necessary information about a specific culture. Programmers will create instances of CultureInfo, to access required information about a culture. For sure the framework supports the Persian language too. One may use ”fa-IR” to create a CultureInfo instance for Persian language in Iran. But at it is discussed here there are a number of problems with this culture instance.

The most critical deficiency of Persian culture is about Persian calendar. While Iranian people use their own calendar, Persian culture assumes they use Arabic Hijri calendar. Following picture shows how CultureInfo assumes HjriCalendar for Persian culture. Also note that PersianCalendar is not even included in OptionalCalendars.


Another problem with Persian culture is about calendar information such as day and month names. They all are Arabic ones:



So in order to have a better Persian CultureInfo one should:

· Find a way to set PersianCalendar for the culture calendar.

· Correct Months and Day names.

Correcting Months and Day names

Months and day names are actually included in DateTimeFormatInfo class property of CultureInfo. They can be easily fixed with code such as:

Culture.DateTimeFormatInfo.MonthNames = new string[] { "فروردین", "ارديبهشت", "خرداد", "تير", "مرداد", "شهریور", "مهر", "آبان", "آذر", "دی", "بهمن", "اسفند", "" };

Using Persian Calendar

Using Persian Calendar is not as straightforward as setting months names. Both CultureInfo and DateTimeFormatInfo include a calendar property. To get proper Persian date formatting one should set these calendars to Persian. One may assume to simply set the Calendar property :

Culture.DateTimeFormatInfo.Calendar = new PersianCalendar();

But the property set method of DateTimeFormatInfo prevents such settings because Persian Calendar is not included in OptionalCalendars of the Persian culture. One may use Reflection to by-pass the property set method to directly access the calendar property:

FieldInfo dateTimeFormatInfoCalendar = typeof(DateTimeFormatInfo).GetField("calendar",

BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Instance);

dateTimeFormatInfoCalendar.SetValue(info, new PersianCalendar());

Where info is a DateTimeFormatInfo. Note how reflection helps in setting a private firld “calendar” in a DateTimeFormatInfo object. This bypasses the set method logic of checking the OptionalCalendars.

Putting it altogether a candidate method for fixing the DateTimeFormatInfo can be:

Code Snippet
  1. public static void FixPersianDateTimeFormat(DateTimeFormatInfo info,bool UsePersianCalendar)
  2. {
  3.     FieldInfo dateTimeFormatInfoReadOnly = typeof(DateTimeFormatInfo).GetField("m_isReadOnly", BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Instance);
  4.     FieldInfo dateTimeFormatInfoCalendar = typeof(DateTimeFormatInfo).GetField("calendar", BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Instance); ;
  6.     if (info == null)
  7.         return;
  8.     bool readOnly = (bool)dateTimeFormatInfoReadOnly.GetValue(info);
  9.     if (readOnly)
  10.     {
  11.         dateTimeFormatInfoReadOnly.SetValue(info, false);
  12.     }
  13.     if (UsePersianCalendar)
  14.     {
  15.         dateTimeFormatInfoCalendar.SetValue(info, new PersianCalendar());
  16.     }
  17.     info.AbbreviatedDayNames = new string[] { "ی", "I", "س", "چ", "پ", "ج", "O" };
  18.     info.ShortestDayNames = new string[] { "ی", "I", "س", "چ", "پ", "ج", "O" };
  19.     info.DayNames = new string[] { "یکOنEه", "IوOنEه", "ﺳﻪOنEه", "چهCرOنEه", "پنجOنEه", "جمعه", "OنEه" };
  20.     info.AbbreviatedMonthNames = new string[] { "فرورIین", "CرIيEهOE", "IرICI", "Eير", "مرICI", "Oهریور", "مهر", "AECن", "Aذر", "Iی", "Eهمن", "CسفنI", "" };
  21.     info.MonthNames = new string[] { "فرورIین", "CرIيEهOE", "IرICI", "Eير", "مرICI", "Oهریور", "مهر", "AECن", "Aذر", "Iی", "Eهمن", "CسفنI", "" };
  22.     info.AMDesignator = "ق.U";
  23.     info.PMDesignator = "E.U";
  24.     info.FirstDayOfWeek = DayOfWeek.Saturday;
  25.     info.FullDateTimePattern = "yyyy MMMM dddd, dd HH:mm:ss";
  26.     info.LongDatePattern = "yyyy MMMM dddd, dd";
  27.     info.ShortDatePattern = "yyyy/MM/dd";
  28.     if (readOnly)
  29.     {
  30.         dateTimeFormatInfoReadOnly.SetValue(info, true);
  31.     }
  32. }


This will fix the DateFormatInfo for Persian Calendar and also months and day names.

Fixing Optional Calendars

An alternative and also more challenging approach would be adding Persian Calendar as an optional calendar. This requires more detail information around how locale specific information are managed by CultureInfo. In fact CultureInfo retrieves culture data from complicated data structures stored in locale files under Windows operating system. Data such as the array of optional calendars are stored in specific data structure and retrieved by special manipulation of pointers.  Following code shows how OptionalCalendars are retrieved from a CultureTableRecord class

internal int[] OptionalCalendars
        if (this.optionalCalendars == null)
            this.optionalCalendars = this.m_cultureTableRecord.IOPTIONALCALENDARS;
        return this.optionalCalendars;

CultureTableRecord then returns

        return this.GetWordArray(this.m_pData.waCalendars);

Which finally returns optional calendars as:

private unsafe int[] GetWordArray(uint iData)
    if (iData == 0)
        return new int[0];
    ushort* numPtr = this.m_pPool + ((ushort*) iData);
    int num = numPtr[0];
    int[] numArray = new int[num];
    for (int i = 0; i < num; i++)
        numArray[i] = numPtr[i];
    return numArray;

Note how pointer calculations are encountered in this evaluation.

To fix the optional calendars of Persian locale one should set the Persian calendar identifier in the appropriate place in the locale data structure. This location may be back calculated from source code above. Then using reflection again to get access to private fields one may get access to the array of optional calendars and fix it on fly.

But there is still another problem. The array lies in a protected memory area. That is you have no write access to that part of memory. A workaround is using VirtualProtect to make this memory writeable before attempting to write back the optional calendars back:


Code Snippet
  1. public static  CultureInfo FixOptionalCalendars(CultureInfo culture, int CalenadrIndex)
  2. {
  3.     InvokeHelper ivCultureInfo = new InvokeHelper(culture);
  4.     InvokeHelper ivTableRecord = new InvokeHelper(ivCultureInfo.GetField("m_cultureTableRecord"));
  5.     // Get the m_pData pointer as *void
  6.     System.Reflection.Pointer m_pData = (System.Reflection.Pointer)ivTableRecord.GetField("m_pData");
  7.     ConstructorInfo _intPtrCtor = typeof(IntPtr).GetConstructor(
  8.                     new Type[] { Type.GetType("System.Void*") });
  9.     // Construct a new IntPtr
  10.     IntPtr DataIntPtr = (IntPtr)_intPtrCtor.Invoke(new object[1] { m_pData });
  12.     Type TCultureTableData = Type.GetType("System.Globalization.CultureTableData");
  13.     // Convert the Pointer class to object if type CultureTableData to work with
  14.     // reflection API.
  15.     Object oCultureTableData = System.Runtime.InteropServices.Marshal.PtrToStructure(DataIntPtr, TCultureTableData);
  16.     InvokeHelper ivCultureTableData = new InvokeHelper(oCultureTableData);
  17.     // Get waCalendars pointer
  18.     uint waCalendars = (uint)ivCultureTableData.GetField("waCalendars");
  19.     object IOPTIONALCALENDARS = ivTableRecord.GetProperty("IOPTIONALCALENDARS");
  21.     // Get m_Pool pointer
  22.     System.Reflection.Pointer m_pool = (System.Reflection.Pointer)ivTableRecord.GetField("m_pPool");
  24.     IntPtr PoolInPtr = (IntPtr)_intPtrCtor.Invoke(new object[1] { m_pool });
  25.     // Add the waCalendars offset to pool pointer
  26.     IntPtr shortArrayPtr = new IntPtr((PoolInPtr.ToInt64() + waCalendars*sizeof(ushort)));
  27.     short[] shortArray = new short[1];
  28.     // Now shortArray points to an arry of short integers.
  29.     // Go to read the first value which is the number of elements.
  30.     // Marshal array to read elements.
  31.     System.Runtime.InteropServices.Marshal.Copy(shortArrayPtr, shortArray, 0, 1);
  32.     // shortArray[0] is the number of optional calendars.
  33.     short[] calArray = new short[shortArray[0]];
  34.     // Add one element of short type to point to array of calendars
  35.     IntPtr calArrayPtr = new IntPtr(shortArrayPtr.ToInt64() + sizeof(short));
  36.     // Finally read the array
  37.     System.Runtime.InteropServices.Marshal.Copy(calArrayPtr, calArray, 0, shortArray[0]);
  39.     uint old;
  40.     VirtualProtect(calArrayPtr, 100, 0x4, out old);
  41.     calArray[CalenadrIndex] = 0x16;
  42.     System.Runtime.InteropServices.Marshal.Copy(calArray, 0, calArrayPtr, calArray.Length);
  43.     VirtualProtect(calArrayPtr, 100, old, out old);
  45.     return culture;
  49. }


CultureData in .Net framework 4.0

The CultureTableRecord class has been replaced by CultureData which holds the Optional Calendars as a private array of integers in waCalendars field. This makes correction of Optional Calndars as easy as correcting a private field:

private static CultureInfo _FixOptionalCalendars4(CultureInfo culture, int CalenadrIndex)
    FieldInfo cultureDataField = typeof(CultureInfo).GetField("m_cultureData",
         BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance );
    Object cultureData = cultureDataField.GetValue(culture);
    FieldInfo waCalendarsField = cultureData.GetType().GetField("waCalendars",
        BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance);
    int[] waCalendars = (int[])waCalendarsField.GetValue(cultureData);
    if (CalenadrIndex >= 0 && CalenadrIndex < waCalendars.Length)
        waCalendars[CalenadrIndex] = 0x16;
    waCalendarsField.SetValue(cultureData, waCalendars); 
    return culture;


Problems with Persian culture in .Net are discussed and methods for correcting these problems are proposed. You may download the sample code from here: Downlad Sample Code

Written by Babak Mahmoudi

August 22, 2011 at 11:09 am

Posted in Persian Localization

Tagged with

.Net Profiling for Persian Localization, Cons and Pros

with 2 comments

There’s no doubt that in order to reasonably localize .Net applications in Persian, sooner or later the localizer should consider tampering .Net assemblies. The reason goes back to unreasonably poor implementation of Persian Calendar in .Net and applications in .Net. Providing Persian calendar is a must in most localization projects and there’s no way other than tampering assemblies to bring about that support.

For instance local users in Iran cannot live without Persian Calendar in their SharePoint sites, and you have to play with codes in SharePoint most important assembly (SharePoint.dll) to support Persian Calendar. Part of it is because that Windows in general does not provide for third party calendar systems to be added to the operating system.

The main trend in providing Persian Calendar in SharePoint is substituting one SharePoint standard calendars, such as Hijri with Persian Calendar. This way one have to somehow replace methods of an internal class namely HijriCalendarImplementation . For instance this class has a static method JulianDayToDate that will do conversion of a Julian day to a SharePoint SimpleDate structure. Obviously this should be changed if one plans to substitute Persian Calendar in place of Hijri. Expert guys here in Iran have used already available tools such as Reflector, to disassemble IL codes of SharePoint.dll and then replacing their codes and rebuilding the assembly back. They replace original assemblies with these modifies version. This way they’ve succeeded the mission.

When I first got this mission in Gostareh Negar, I actually didn’t know much about .Net programming. I was a C++ programmer, already expert in native code tracing and DLL overriding. Back to my experience in native code, I knew that sooner or later, rebuilding binary DLLs would show its disadvantages. So I just put my efforts to come up without a solution that does not require replacing the original libraries on persisted storage (hard disk). This leaded me to .Net Profiling API.

.Net Profiling API (see here) is originally devised for profiling tasks, i.e. performance measurements. Using this way one could instrument assemblies with specific calls to measure code metrics such as speed. For instance it can insert calls in method entry and exit points so that the total execution time of a method can be recorded. In effect, Profiling API provides methods to inject codes at run time when the CLR executes an application.

CLR (Common Language Runtime) includes a cross-CPU instruction format (Intermediate Language, or IL), and a JIT compiler to turn the IL into code executable by the target CPU. When it starts executing an assembly, it first Just In Time compile the IL codes into native machine code instructions on the target CPU. Within this process CLR may be asked to call a registered profiler and let it do profiling tasks and instrumentations including replacements of the IL code. In effect this will open a way to change IL codes at run time and an elegant way to do our localization mission.

While traditional profilers focus on instrumenting methods with measuring and logging calls, I focused on redirecting methods. Finally I came up with a Redirector. This could redirect method calls to another assembly by replacing method body with a call to injected method. Now I was able to inject my Persian Calendar codes directly into SharePoint.dll without touching the original assembly on the disk.

This method of code redirection based on profiling has many advantages including:

  • It’s switchable: Many users fear that messing with binary codes may have side effects and causes errors. Since profilers can be easily switched off by server config, in case of suspicious behavior one may easily switch the redirection off and check if the problem is with the injected code.
  • Does not require rebuild on new versions: When original provider releases a new version of the assembly, there is a good chance that changes are not in the redirected code. In case of SharePoint for instance the code for HijirCalendar didn’t change across the service packs and in SharePoint 2010. Therefore the redirector may still work on newer version of the assembly while others should rebuild it. In fact, Gostareh Negar clients installed SharePoint service packs without asking for an update.
  • Does not interfere with code signing: Since original assemblies are normally signed, rebuilding them requires resigning which is normally a head-ache. Redirecting occurs in JIT compilation phase, and does not encounters signing issues.

There are also disadvantages:

  • Speed: .Net code runs with lower performance while being profiled. CLR have to do profiling notifications in addition to normal tasks. This performance decrease is actually in load phase, when the program is completely JIT compiled, the effect vanishes. For web application it happens when the w3p process restarts.


Redirecting method based on .Net Profiling can be reasonably be a good plan for Persian Localization at least for web applications.





Written by Babak Mahmoudi

July 27, 2011 at 7:06 pm

Posted in Persian Language

State Machine Workflows

leave a comment »

I often find the differences between State Machine and Sequential workflows much like those of event-driven programming with old control flows. Back to old days when we programmed with FORTRAN, we often thought of how to control the flow of program by branch instructions to do a job. The flow of control often had only one or at most a few of predefined paths. We could plan for these paths with the IF THEN branching instructions. Event driven programs are not so. Who can imagine the paths of instructions when someone uses Word?

State Machines are type of event-driven workflows most suitable for situations when it is hard to draw all of possible paths of a process. This is why State Machines provide a more flexible approach in programming Business Processes. It’s a pit SharePoint focuses on Sequential workflows.

Written by Babak Mahmoudi

July 24, 2011 at 7:26 am

Posted in SharePoint

Localization of MOSS Built-In Workflows

with one comment

MOSS comes with a number of built-in workflows, such as Approval, Collect Signature and Feedback. It seems that standard language template packs, like that of Arabic language does not support these workflows. The challenge of localizing these workflows is actually that of translating the associated InfoPath forms. In this post I’ll discuss it in detail.

 Research and Findings:

XSN files are actually CAB files, one may open and edit and then recab them.

 Generic workflows, like Translation Management are implemented in Microsoft.Office.Workflow.Pages They have a code behind page to IniWrkflIP.ASPX with an onload method that actually computes the form and requests it from form server. The localization occurs here where a $Subst:LCID in the form urn name will be replaced with the actual LCID.

internal static
string GetLocalizedFormUrn (string

     return formUrn.Replace(“$Subst:LCID;”, web.Language.ToString(CultureInfo.InvariantCulture));
Problem with “the form is not workflow enabled”
I did everything I could for preparing a Persian version off the forms. But when I installed these forms, I still had problem. SharePoint reported a “the form is not workflow enabled” error. Actually the form templates on Centeral Administration has a Workflow column which indicated “No” for my forms. Now I know that my forms should be installed with the feature to get enabled. So I did reinstalled the workflow feature. It just seems that any form in the feature directory will get installed this way.
We don’t know how’s that the original form templates are not “PUBLISHED” the way we should “publish” our forms to a sharepoint site prior to installing them on the form server. When we publish our forms the form will be marked for being published on our servers and we don’t know how they will be published on client urls:

The XSN has a tag with PublishURL on it, u omit this and form sever will assume that the form is correctly published.
<xsf:xDocumentClass solutionVersion=”″ productVersion=”12.0.0″ solutionFormatVersion=”″ name=”urn:schemas-microsoft-com:office:infopath:ReviewRouting-Assoc-1065-Edit:-myXSD” publishUrl=”C:\MyProjects\FarsiMOSS\GenericWorkflows\Review\Assoc\ReviewRouting_Assoc_1065_Edit.xsn” xmlns:xsf=”; xmlns:msxsl=”urn:schemas-microsoft-com:xslt” xmlns:xdUtil=”; xmlns:xdXDocument=”; xmlns:xdMath=”; xmlns:xdDate=”; xmlns:my=”; xmlns:xd=”; xmlns:xsi=”; xmlns:xhtml=”; xmlns:aml=”; xmlns:dt=”uuid:C2F
Finally these are the flowing steps to translate a generic workflow form:

  • Get a copy of the form. It is in Features directory. For example for Review_Assoc:
    C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\FEATURES\ReviewWorkflows\Forms\ReviewRouting_Assoc_1033.XSN
  • Do the translation with InfoPath Save the file for 1065. For our example it would be ReviewRouting_Assoc_1065.XSN
  • Set the locale of the form on Persian if u like. This has no effect other than on Manage Form Templates if Centeral Administrator u’ll get correct form locale. But whatever locale u said, SharePoint is actually using te Form ID which should be corrected for 1065 (see below)
  • After completing the translation. Rename it with .Cab extension. Didn’t u know that? Inforpath forms are actually Cabinet compressed files.
  • You’ll find a MANIFEST.XSF file in the cabinet. Open it with note pad and edit it:
    • At the top of the file u’ll find a xsf:xDocumentClass tag. Within it u’ll get a name attribute. It should be corrected for the correct form name. Remember that when u edited the form template, InfoPath had automatically changed the form name here. It should be identical to that of english form name, just 1033 should be changed to 1065. For our example it would be:
    • Just next to it u’ll find a publishURL attribute. Remove the attribute, i.e. delete it totally.


  • You should then Edit the Template.XML file. This file also contains a solution name that should be identical to the form name you set in the previous step. At the top of file, u’ll find a ?mso-infoPathSolution tag. Set the name attribute. For our example it would be:
  • Now recab the form. U may use:
    Cabarc n ReviewRouting_Assoc_1065.XSN *.*
  • Copy the form to the soultion folder, where you picked the original 1033 version.
  • Now the feature should be reinstalled, so that the forms be installed with the feature. One may use STSADM to do that:stsadm -o deactivatefeature -name reviewworkflows -force -url http://babakserver 
    stsadm -o uninstallfeature -name reviewworkflows -force
    stsadm -o installfeature -name reviewworkflows
    stsadm -o activatefeature -name reviewworkflows -url http://babakserver


  • Sometimes it will be required so that you should Deactivate and re Activate the feature from Site Collection Features page.
  • To check if everything is ok go to SharePoint Central Administration \ Application Management\ Manage Form Templates. You should see somehing like this:
  • Note that the form should be Workflow Enabled.
  • Click on the 1065 version and use the View Properties option. You should see something like this:

  • Specially see the Form ID it should be exactly like what is shown here. Otherwise the form won’t be found.
  • Use the procedure for all forms in the workflow.

Now maybe you can see something like this:



Written by Babak Mahmoudi

December 23, 2008 at 8:19 am

Posted in SharePoint

Tagged with ,

Morphological Rules Pertaining to Persian Spell Checking

with 12 comments

“Analyzing Persian texts as some stemmer algorithms is essential for efficient spell checking because: It provides the level of consistency needed and It may work with a concise lexicon.
In this article the morphological rules pertaining to such algorithms are studied.”
In Persian words are extensively combined with various prefixes and suffixes, to make new words. In this sense, and if we define words digitally as strings of characters surrounded by space, the number of Persian words are enormously larger as compared to Latin languages as English. For example the word كتاب (ketab=book) generates following derivatives:

ketab_ha books
ketab_am my book, I am a book 
ketab_at your book 
ketab_ash his book 
ketab_eman our book 
ketab_eshan their book 
ketab_i a book, you are a book, related to books
ketab_im we are books 
ketab_id you are books 
ketab_and they are books 
ketab_itar more related to books 
ketab_itarin most related to books 
ketab_hayam my books 
ketab_hayat your books 
ketab_hayash his/her books
ketab_hayeman our books
ketab_hayetan your books
ketab_hayeshan their books
ketab_haei some books
*the suffixes are presented just as they spelled in Persian.

 As seen in this example 19 different words can be made by the simple root “ketab”.

 The term Morphological Rules then refers to such rules in Persian that specify how new words can be made. It should be noted here that, by making words, we do not mean the process of generating totally new words as it is usually meant in Persian literature. Actually no one talks about ‘ketab_ha’ as a new word made from ‘ketab’. This is because our digital definition of word: “a string of letters separated by space”
Thus, here we are confined rather to those simple and certain rules that are thought to be useful in the process of digital proofing.

The term curtain is important because, we are not going to consider about those patterns that are rarely used. We consider those rules that can be applied almost in all cases. Nevertheless, the rules are applicable to words based on their grammatical natures. For example you cannot pluralize a pronoun, or only verbs can be conjugated.

Thus, it should be assumed that the Morphological Rules studied here are supported by some Lexicon in which Morphemes are stored with flags that designate their grammatical nature as pertaining to stated rules. The terms Flag, and Morpheme in this article refers to such Lexicon…

Find the remaining on the following link.

Download Complete Document

Written by Babak Mahmoudi

December 8, 2008 at 2:38 pm

Posted in Persian Language