Just an interesting observation that I found when looking at some code using reflector. If you have a switch statement on a string, and there is a small set of cases, (<~10), the switch is changed to a series of if/else statements. If your list is more than 10, it creates a hashtable, and inserts all the strings. Then using the expression value it find the index of the value in the hashtable and uses that as its key. I'm sure this is done for a performance reason, but I couldn't speculate as to what it was. By looking at this, it would seem to be that it is best to use enumerations, rather than hardcoded strings wherever possible.Decompiled Switch statement with 5 cases public string SelectILTest5(string input)
{
string text2;
if ((text2 = input) != null)
{
text2 = string.IsInterned(text2);
if (text2 != "a1")
{
if (text2 == "a2")
{
return "a2";
}
if (text2 == "a3")
{
return "a3";
}
if (text2 == "a4")
{
return "a4";
}
if (text2 == "a5")
{
return "a5";
}
}
else
{
return "a1";
}
}
return "";
}Decompiled Switch statement with 15 cases(C#)
public string SelectILTest15(string input)
{
switch (input)
{
case "a1":
{
return "a1";
}
case "a2":
{
return "a2";
}
case "a3":
{
return "a3";
}
case "a4":
{
return "a4";
}
case "a5":
{
return "a5";
}
case "a6":
{
return "a6";
}
case "a7":
{
return "a7";
}
case "a8":
{
return "a8";
}
case "a9":
{
return "a9";
}
case "a10":
{
return "a10";
}
case "a11":
{
return "a11";
}
case "a12":
{
return "a12";
}
case "a13":
{
return "a13";
}
case "a14":
{
return "a14";
}
case "a15":
{
return "a15";
}
}
return "";
}Decompiled Switch statement with 15 cases (IL)
.method public hidebysig instance string SelectILTest15(string input) cil managed
{
// Code Size: 524 byte(s)
.maxstack 4
.locals init (
string text1,
object obj1)
L_0000: volatile
L_0002: ldsfld [mscorlib]System.Collections.Hashtable <PrivateImplementationDetails>::$$method0x6000015-1
L_0007: brtrue L_0124
L_000c: ldc.i4.s 30
L_000e: ldc.r4 0.5
L_0013: newobj instance void [mscorlib]System.Collections.Hashtable::.ctor(int32, float32)
L_0018: dup
L_0019: ldstr "a1"
L_001e: ldc.i4.0
L_001f: box int32
L_0024: call instance void [mscorlib]System.Collections.Hashtable::Add(object, object)
L_0029: dup
L_002a: ldstr "a2"
L_002f: ldc.i4.1
L_0030: box int32
L_0035: call instance void [mscorlib]System.Collections.Hashtable::Add(object, object)
L_003a: dup
L_003b: ldstr "a3"
L_0040: ldc.i4.2
L_0041: box int32
L_0046: call instance void [mscorlib]System.Collections.Hashtable::Add(object, object)
L_004b: dup
L_004c: ldstr "a4"
L_0051: ldc.i4.3
L_0052: box int32
L_0057: call instance void [mscorlib]System.Collections.Hashtable::Add(object, object)
L_005c: dup
L_005d: ldstr "a5"
L_0062: ldc.i4.4
L_0063: box int32
L_0068: call instance void [mscorlib]System.Collections.Hashtable::Add(object, object)
L_006d: dup
L_006e: ldstr "a6"
L_0073: ldc.i4.5
L_0074: box int32
L_0079: call instance void [mscorlib]System.Collections.Hashtable::Add(object, object)
L_007e: dup
L_007f: ldstr "a7"
L_0084: ldc.i4.6
L_0085: box int32
L_008a: call instance void [mscorlib]System.Collections.Hashtable::Add(object, object)
L_008f: dup
L_0090: ldstr "a8"
L_0095: ldc.i4.7
L_0096: box int32
L_009b: call instance void [mscorlib]System.Collections.Hashtable::Add(object, object)
L_00a0: dup
L_00a1: ldstr "a9"
L_00a6: ldc.i4.8
L_00a7: box int32
L_00ac: call instance void [mscorlib]System.Collections.Hashtable::Add(object, object)
L_00b1: dup
L_00b2: ldstr "a10"
L_00b7: ldc.i4.s 9
L_00b9: box int32
L_00be: call instance void [mscorlib]System.Collections.Hashtable::Add(object, object)
L_00c3: dup
L_00c4: ldstr "a11"
L_00c9: ldc.i4.s 10
L_00cb: box int32
L_00d0: call instance void [mscorlib]System.Collections.Hashtable::Add(object, object)
L_00d5: dup
L_00d6: ldstr "a12"
L_00db: ldc.i4.s 11
L_00dd: box int32
L_00e2: call instance void [mscorlib]System.Collections.Hashtable::Add(object, object)
L_00e7: dup
L_00e8: ldstr "a13"
L_00ed: ldc.i4.s 12
L_00ef: box int32
L_00f4: call instance void [mscorlib]System.Collections.Hashtable::Add(object, object)
L_00f9: dup
L_00fa: ldstr "a14"
L_00ff: ldc.i4.s 13
L_0101: box int32
L_0106: call instance void [mscorlib]System.Collections.Hashtable::Add(object, object)
L_010b: dup
L_010c: ldstr "a15"
L_0111: ldc.i4.s 14
L_0113: box int32
L_0118: call instance void [mscorlib]System.Collections.Hashtable::Add(object, object)
L_011d: volatile
L_011f: stsfld [mscorlib]System.Collections.Hashtable <PrivateImplementationDetails>::$$method0x6000015-1
L_0124: ldarg.1
L_0125: dup
L_0126: stloc.1
L_0127: brfalse L_0202
L_012c: volatile
L_012e: ldsfld [mscorlib]System.Collections.Hashtable <PrivateImplementationDetails>::$$method0x6000015-1
L_0133: ldloc.1
L_0134: call instance object [mscorlib]System.Collections.Hashtable::get_Item(object)
L_0139: dup
L_013a: stloc.1
L_013b: brfalse L_0202
L_0140: ldloc.1
L_0141: unbox int32
L_0146: ldind.i4
L_0147: switch (L_018a, L_0192, L_019a, L_01a2, L_01aa, L_01b2, L_01ba, L_01c2, L_01ca, L_01d2, L_01da, L_01e2, L_01ea, L_01f2, L_01fa)
L_0188: br.s L_0202
L_018a: ldstr "a1"
L_018f: stloc.0
L_0190: br.s L_020a
L_0192: ldstr "a2"
L_0197: stloc.0
L_0198: br.s L_020a
L_019a: ldstr "a3"
L_019f: stloc.0
L_01a0: br.s L_020a
L_01a2: ldstr "a4"
L_01a7: stloc.0
L_01a8: br.s L_020a
L_01aa: ldstr "a5"
L_01af: stloc.0
L_01b0: br.s L_020a
L_01b2: ldstr "a6"
L_01b7: stloc.0
L_01b8: br.s L_020a
L_01ba: ldstr "a7"
L_01bf: stloc.0
L_01c0: br.s L_020a
L_01c2: ldstr "a8"
L_01c7: stloc.0
L_01c8: br.s L_020a
L_01ca: ldstr "a9"
L_01cf: stloc.0
L_01d0: br.s L_020a
L_01d2: ldstr "a10"
L_01d7: stloc.0
L_01d8: br.s L_020a
L_01da: ldstr "a11"
L_01df: stloc.0
L_01e0: br.s L_020a
L_01e2: ldstr "a12"
L_01e7: stloc.0
L_01e8: br.s L_020a
L_01ea: ldstr "a13"
L_01ef: stloc.0
L_01f0: br.s L_020a
L_01f2: ldstr "a14"
L_01f7: stloc.0
L_01f8: br.s L_020a
L_01fa: ldstr "a15"
L_01ff: stloc.0
L_0200: br.s L_020a
L_0202: ldstr ""
L_0207: stloc.0
L_0208: br.s L_020a
L_020a: ldloc.0
L_020b: ret
}
We're in the early stages of a project I'm currently working on and we're starting to see some performance concerns. Since the piece that I wrote (The data access layer) is used pretty heavily, I decided to profile the application to see if there were any glaring bottlenecks.
I downloaded nprof from SourceForge and used that as the profiler. Overall, it was pretty easy to setup and use. My perception is that profiling with nprof didn’t add a lot of overhead to the profiled process.
When you’re done running a profile, you’re left with some pretty good statistics: # times a method was called, total percentage of total time spent in that method, total percentage time spent in the child method. These are all broken out by thread. You can view who called a particular method and what methods it calls. It would be nice if there was more information about the thread, such as the life of the thread and so forth.
Turns out there is a measurable amount of time in my data layer creating temporary serialization assemblies, so I’m going to rework it to use this method by Daniel Cazzulino
On the whole, it is a great tool. It’s unfortunate that there hasn’t been any refinement to it lately.
Code Camp IV, held this past weekend was pretty cool. It was amazing to see the incredible turnout. The support the Boston community has for these types of events... All the attendees I talked with loved the sessions. One of the main themes I kept hearing was, "I know something about topic XYZ. But you never know what you don't know. The session gave me a lot more depth."
I delivered three talks that went over reasonably well. It was cool to meet some of my fellow user group leaders like Julia Lerman and Jason Haley.
Can't wait for Code Camp V!
Code Camp IV – Developers Gone Wild
If you are looking for a lot of great information, check out Code Camp IV this weekend. Here's a link to register here. I’ll be presenting three sessions:
Sept 24, 2005, 7:15PM – Rhode Island Room
Scraping Data from Websites with the .NET Framework
Sept 25, 2005, 10:45AM – Providence Room
Introduction to Code Access Security
Sept 25, 2005, 1:15 PM – Technical Briefing Center
Improving the performance of your SQL 2000 database
See you there!
http://www.joelonsoftware.com/articles/Unicode.html
This article is awesome! True to the author's claim, dealing with Unicode is not hard once you understand it. I've been one of those developers that did kind of hope that everything would return to normal where a character is alway 8bits. I've been putting off understanding string encoding issues because it seemed to be a complicated issue for a problem I rarely have. It really isn't. Give Joel's article a read.
As I was preparing for my talk tonight on web scraping, I came across a class library that has proved to be invaluable. The HTML Agility Pack is awesome. It allows you to download the HTML from a website and navigate through it like an XML document or using XPath queries. You could do this before by hosting an IE Browser control in your scraping app, and going through the document using DOM. However the IE browser control has a problem with badly formed HTML and unfortunately, most of the data on the web is not well formed. The HTML Agility Pack deals with badly formed HTML just as easily as it does with well formed HTML. This cut down the time required for me to write a scraper from a couple of days to a couple of hours.
People in my fantasy baseball league, beware. I've downloaded a significant amount of baseball data and I know how to use it!
Tonight I'm speaking at the NH VB.NET user group. If any of you are around the Manchester NH area, stop by.
NHVBUG Meeting - March 24 3/24/2004 - SNHU Campus 5:30 - 6:00 General Announcements & Questions
6:00 - 7:00 Marc Thevenin will be demonstrating how to create .NET applications that are self-updating using a freely available component. Never worry about deploying software updates again!
7:00 - 7:30 Phil Denoncourt will continue his talk on web scraping. Web scraping is the process of writing software agents that download information from the internet, parsing the information, and then loading it into a database.
7:30 - 8:00 Windows XP Service pack 2. This is not your typical service pack. Fundamental changes to the operating system will change the way your application works. Come find out what you need to be aware of. |
Me and a few other developers at work were discussing GUID/UUIDs and why they are the ultimate primary key. I love them! Every new database I start, I use a GUID as the datatype for primary keys. This frees me from the semantic key problem. It also gives you the freedom to create the primary key from within your application, rather than making a call to the database to get the next primary key. This gives you tremendous flexibility in disconnected apps, and increases your scalability in traditional apps. Best thing since shoes without laces.
Marc Thevenin was convinced that there was no real way to guarantee a key based on a number could be created that would be truely unique. I considered this a challenge and set out to prove him wrong.
Turns out, he's theoretically right, at least in the way GUIDs are generated. Information provided at opengroup.org outlines the algorithm used to generate GUIDs. If you look at it, it's based completely on two things. The Time and a MAC address. Many motherboards that have integrated network adapters, allow you to change the MAC address. So, if you set two computers to have the same MAC address(of course, not on the same network segment), and power them on at the exact same time (or within 100 nanoseconds), the two computers could generate identical GUIDs. Therefore it is possible in theory, but I suspect that the chances of this happening in reality is comparable to winning 5 different state lotteries on the same day.
Looking at the routine, I am amazed at the simplicity and robustness, but I do wonder about the following:
The time used to generate a GUID is the number of 100 nanosecond intervals that have occurred since Oct 15, 1582. Near as I can tell between 1582 and 1990, over 128,502,443,520,000,000 intervals have passed that will never be used to generate GUIDs. That's a waste of between 47-48 bits. That means only 79-80 bits in the GUID are meaningful.
When I first started programming Windows applications back in the early 90's, I used to read all of the KB articles that Microsoft put out. This kept me relatively well informed of some of the more advanced features in their products. It also made me aware of their limitations. Over time, there were too many articles and too little time to keep up with them, so I've since stopped trying.
Now the KB articles can be retrieved using RSS! If you go to http://www.kbalertz.com/allKbs.aspx drill into the category you're interested in and subscribe using the link on the XML icon. This is a great way to keep track of developments out of Microsoft.
I've written quite a few HTML scrapers (reading an HTML page, and parsing out information contained in it) and the biggest part of these programs are the string manipulation. I usually break the HTML page up into string arrays and run through the array looking for keywords. In .NET, you can break strings up using the .Split method of a string object or you can use Regular expressions. I find regular expressions powerful, but cryptic to write and maintain, so I use the split method more often than not. Darren Niemke has benchmarked different methods of spliting strings in .NET
|
Copyright © 2010 Phil Denoncourt III. All rights reserved.
DasBlog 'Portal' theme by Johnny Hughes.
Pick a theme:
|
|
|