Blog Home  Home Feed your aggregator (RSS 2.0)  
What did you learn today? - Tuesday, January 09, 2007
Phil Denoncourt's Technology Rants
 
 Tuesday, January 09, 2007
Introduction to WCF by phildenoncourt

I'll be speaking at two user groups in the upcoming weeks:

Introduction to WCF (Windows Communication Foundation).

.NET 3.0 was released in November.  WCF is one piece of the 3.0 puzzle and is a substantial improvement over ASP.NET web services. 

 

Phil will show the overall architecture and how easy it is to develop WCF Services.

 

Phil Denoncourt is a .NET consultant, who over the past 5 years has developed a wide range of .NET applications and has over 20 years experience writing software. He is the leader of the New England C# User Group and has acquired the MCPD, MCTS, MCSD, MCDBA, MCSE, MCAD, MCSA and MCP+SB certifications.

 

When not coding, or spending time with his wife and 5 children, Phil is an avid fan of the Boston Red Sox and New England Patriots.

 

Jan 11 2007 at New England C# Users Group

 

Jan 18 2007 at New Hampshire .NET Users Group

Tuesday, January 09, 2007 6:13:05 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]   Speaking Engagements  | 
 Sunday, November 19, 2006
Sample Databases by phildenoncourt

For the past month, I've been focusing on getting up to speed on the data mining features of SQL 2005. Really amazing stuff. I'll be giving a presentation on this in the near future. One of the things that took a significant amount of time was putting together a database that I could use for testing. I didn't want to use AdventureWorks or something like that because I wanted something that had more data and more "real world".

So I downloaded the past 20 years of stock prices from a public quote server, the past 10 years of foreign currency prices, and a slew of economical data from Federal Reserve. (See picture below). The database has over 7,300 companies (I tried to get all NASDAQ, NYSE, and AMEX tickers) with over 16 million quotes.

Now I'm creating models and running predictions. For the most part, I'm able to exercise all of the data mining algorithms. I haven't found the secret formula to the stock market yet, but someday…

If anybody else is interested getting a copy of this database, let me know via the contact link. Because it's over a gigabyte in size and I don't have massive bandwidth allowances in my hosting account, it has to be transported via postal mail. PayPal me $30 to cover the cost of burning a DVD and sending it, and I'll get it out to you.

Sunday, November 19, 2006 3:45:32 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]   SQL  | 
 Wednesday, September 13, 2006

I was sitting in this meeting in which I was only peripherally involved and there's only so much doodling you can do.  So I did what most other people in the meeting were doing, I started daydreaming. 

I'm a big fan of GUIDs, but there's all this noise about how slow they are because they're bigger than an int making for inefficient searching.  So I started to wonder, how much slower are GUIDs than ints? So I started to do the math, and for a table with 4 billion rows, I was coming out with only one more logical read if the key was a GUID than an int.... I know.. Didn't make sense to me.  So I put it to a practical test.

I created two tables.  One called IntWithData and the other called GuidWithData.


I then populated them with a million rows.  --As a side note, I started the scripts to populate the tables at about the same time, and they finished at the same time.  Something I think I will research further is how much slower it is to insert a record in a GUID table compared to an INT table.  Common sense tells you it's supposed be way slower to populate a GUID--

I took a look at the statistics for the index by using the following statements:
SELECT * FROM sys.dm_db_index_physical_stats(db_id(),object_id('dbo.GuidWithData'),null,null,'DETAILED')
SELECT * FROM sys.dm_db_index_physical_stats(db_id(),object_id('dbo.IntWithData'),null,null,'DETAILED')

That produced these results:

database_id object_id index_id partition_number index_type_desc alloc_unit_type_desc index_depth index_level avg_fragmentation_in_percent fragment_count avg_fragment_size_in_pages page_count avg_page_space_used_in_percent record_count ghost_record_count version_ghost_record_count min_record_size_in_bytes max_record_size_in_bytes avg_record_size_in_bytes forwarded_record_count
6 2105058535 1 1 CLUSTERED INDEX IN_ROW_DATA 3 0 99.037 5919 1 5919 68.85677 1000000 0 0 31 31 31 NULL
6 2105058535 1 1 CLUSTERED INDEX IN_ROW_DATA 3 1 96.2963 27 1 27 67.68663 5919 0 0 23 23 23 NULL
6 2105058535 1 1 CLUSTERED INDEX IN_ROW_DATA 3 2 0 1 1 1 8.314801 27 0 0 23 23 23 NULL

 

database_id object_id index_id partition_number index_type_desc alloc_unit_type_desc index_depth index_level avg_fragmentation_in_percent fragment_count avg_fragment_size_in_pages page_count avg_page_space_used_in_percent record_count ghost_record_count version_ghost_record_count min_record_size_in_bytes max_record_size_in_bytes avg_record_size_in_bytes forwarded_record_count
6 2073058421 1 1 CLUSTERED INDEX IN_ROW_DATA 3 0 0.423403 317 8.195584 2598 99.84113 1000000 0 0 19 19 19 NULL
6 2073058421 1 1 CLUSTERED INDEX IN_ROW_DATA 3 1 0 9 1 9 46.33886 2598 0 0 11 11 11 NULL
6 2073058421 1 1 CLUSTERED INDEX IN_ROW_DATA 3 2 0 1 1 1 1.420806 9 0 0 11 11 11 NULL

This is tough to read, but what you should be getting out of this is: Guid table is very fragmented.  Int table is not.  Also the Guid table had about 2.25 more pages than the Int table.  Ideally you would be running an overnight process that reorganized pages in your DB, and that would help with the fragmentation.

So now the test query.  The question I wanted to answer was:  How many logical reads are needed to read a record out of the Int table and how many are needed to read out of the Guid table?

SET STATISTICS IO ON

SELECT * FROM IntWithData WHERE ID = 619284

SELECT * FROM GuidWithData
WHERE ID = '7DD78950-D3CD-4016-8D92-738A6E0666F2' -- I had to find this value in my data

The results:
....

(1 row(s) affected)

Table 'IntWithData'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

Table 'GuidWithData'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

3 logical reads. That means in a table with 1 million rows, there should be no difference between a table with Guid as a key and one that has an int.  This is because the records are searched for using a binary search algorithm.  In a perfectly balanced tree, it shouldn't take the algorithm more than 20 tries to find the key with many of those "tries" are contained in the same index page.  At some volume, and I haven't calculated exactly where, there will be a difference of one logical read, but that's it.

Hopefully, I haven't misinterpreted the results of this test.  I don't think I have.  The bottom line is that Guids take up more physical space, but you're able to find them just as fast as ints.

 

Wednesday, September 13, 2006 9:51:55 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]   Development | SQL  | 
 Friday, August 18, 2006

I was preparing for the SQL Server Exam 70-441 (Designing Database Solutions by Using Microsoft® SQL Server™ 2005) and I kept coming across a whole bunch of small features that you don’t hear a lot about.  I’m not talking about the big ones everyone already knows about like CLR Integration, DDL triggers, XML, and ranking functions.  These are ones that took me by pleasant surprise.  I think they’re really cool, so I wanted to point them out.

 

Included Columns / Covering Indexes

Say you had a critical/oft-used report that ran a statement like:

 

SELECT Name, ProductNumber

FROM Production.Product

WHERE Style='U' AND Class='L'

 

If you wanted to put an index on the table to speed things up, you’d put an index on Style, Class, Name, and ProductNumber.  That would remove the need to go from the index page to the table page to get all the information for the query.  SQL 2005 has added the include clause to the Create Index statement.

 

CREATE INDEX IX_Product_Class_Style

ON Production.Product (Class,Style)

INCLUDE (Name, ProductNumber);

 

What this is doing is indexing just the Class and Style fields of the table.  The Name and ProductNumber are stored at the leaf nodes of the index.  This saves space in the index pages, meaning you can get more rows in an index page, reducing I/O and Disk Space.

 

Dynamic Management Views

Seriously, these are pretty cool.  Really.  If you haven’t looked at these, take the time to play around with a few. 

 

With these new views, you are able to get high visibility into the SQL Server engine.  Wondering what’s about to be written to the disk?  Query sys.dm_io_pending_io_requests.  Troubleshooting execution plans?  Use sys.dm_exec_cached_plans and sys.dm_exec_plan_attributes.  Need to see what’s running on the server right now?  sys.dm_os_tasks.  How often is an index used?  There’s sys.dm_db_index_usage_stats that will answer that question.  There are over 80 views that provide very useful information when you are troubleshooting problems or just curious as to how SQL Server organizes itself.

 

sp_create_plan_guide

Every now and then, I have a query that runs like a jackrabbit on my development machine, but runs like 3 legged turtle on a production machine.  This is usually because it decided not to use an index on the production machine due to differences in data density or because running the query with parallelism is slower with some statements.  To troubleshoot these situations, I end up modifying the stored procedure to add various hints.

 

Rather than modify the stored procedure, you can specify a hint for an existing query plan using the sp_create_plan_guide stored proc.  This way you can experiment without modifying the stored procedure and the hint doesn’t become part of your codebase.  MSDN documentation can be found here.

 

OUTPUT Clause

A lot of people already know about this pretty cool feature, but I didn’t.  In the past, if you do an insert to a table, and the table has an identity column, after the insert, you would look at @@IDENTITY or scope_identity().  That always seemed hackish to me.

 

You can now add an OUTPUT clause to the end of an INSERT, UPDATE, or DELETE statement.  That will return the rows changed by the statement.

 

INSERT INTO Person.Address

      (AddressLine1,AddressLine2,City

      ,StateProvinceID,PostalCode,rowguid,ModifiedDate)

VALUES(@AddressLine1,@AddressLine2,@City

      ,@StateProvinceID,@PostalCode,@rowguid,@ModifiedDate)

OUTPUT INSERTED.AddressID;

 

You use the pseudo tables INSERTED and DELETED, just like in triggers.

 

Storing BLOBs in Table page

Inevitability, on each project I work on, there is a requirement to have a field capable of storing more than 8k in a table.  It’s usually a field that the user can enter unlimited comments about a customer.  So I define it as a text field.  I always feel guilty about it because the user’s text will rarely be more than 100 bytes and I know it’s going to be stored on a different page.

 

In SQL 2005, if the data will fit in the same page as the rest of the record, it will place it in the same page.  This works with varchar/varbinary/nvarchar(max) and XML datatypes.  The behavior is described here at MSDN.

Persistence of Computed Columns

This is kind of neat.  We’ve had the ability to define computed columns in tables for a while.  They worked exactly as they’re named.  When the column was needed, it was computed.  Now the results of the computed column can be saved to disk.  This allows you to index computed columns.  You can also feel free to create calculated columns that can be computationally intense.

 

CREATE TABLE LineItem (

      LineItemID int IDENTITY(1,1),

      OrderID int,

      ItemDesc varchar(512),

      Qty int,

      Cost money,

      LineTotal as Qty * Cost PERSISTED

)

 

Those are all the ones I found.  What you have noticed as small new features in SQL that make life easier?

Friday, August 18, 2006 9:01:27 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]   Development | SQL  | 
 Wednesday, June 14, 2006

Tomorrow I'll be speaking at the New Hampshire .NET User's group.  More information can be found at www.nhdn.com

6:00 - 7:00 PM Xml Serialization.  XML Serialization is the process of transforming a complex object to and from XML.  This is made straightforward with the .NET framework.  We'll take a look out how to serialize a simple object and some of the attributes used to control the output.  Then we'll take a look at some of the more advanced features like capturing processing errors, overrides and unknown elements.  Finally we'll take complete control of the serialization process using the new features of the 2.0 Framework.  Be the envy of all your coworkers by mastering XML serialization

7:00 - 8:00 PM Xml in SQL 2005.  There are a variety of different ways that SQL Server can work with XML data.  Phil will demonstrate the various methods such as the XML datatype, OpenXML, How to return XML data in queries, Annotated XSDs, and interoperability with .NET

Phil Denoncourt is a .NET consultant, who over the past 4 years has developed a wide range of .NET applications and has over 20 years experience writing software. He is the leader of the New England C# User Group and has acquired the MCPD, MCTS, MCSD, MCDBA, MCSE, MCAD, MCSA and MCP+SB certifications. When not coding, or spending time with his wife and 5 children, Phil is an avid fan of the Boston Red Sox and New England Patriots.

Wednesday, June 14, 2006 11:15:28 AM (GMT Standard Time, UTC+00:00)  #    Comments [0]   Speaking Engagements  | 
 Thursday, March 09, 2006

UPGRADE: MCSD Microsoft .NET Skills to MCPD Enterprise Application Developer by Using the Microsoft .NET Framework: Part 2

Create a client application to access a remote object.

  • Create a remote object.
  • Configure a client application programmatically.
  • Configure a client application manually by using configuration files.
  • Access the remoting service by calling a remote method.
  • Call a remote method synchronously in a client application.

To create a remote object, configure the client application to use remoting.  This can be done using a config file, or programatically.  MSDN has information on setting up a client application.

You can see how to progmatically setup a client on Jibin Pan's article at c-sharpcorner

MSDN has information on how to setup a client using configuration files.

To call a remote method, create the object as usual (if you have remoting configured for the type) or use Activator.GetObject.

By default remoting calls are synchronous.  So I think they meant to say "Call a remote method asynchronously in a client application".  MSDN has information on how to do invoke a method asynchronously.

Thursday, March 09, 2006 8:52:12 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]   Certifications  | 
UPGRADE: MCSD Microsoft .NET Skills to MCPD Enterprise Application Developer by Using the Microsoft .NET Framework: Part 2

Create and configure a server application.

  • Create a server application domain.
  • Configure a server application programmatically.
  • Configure a server application by using configuration files.
  • Compile and build a server application.

In general, it's good to look at the Remoting FAQ at thinktecture.

To create a server app domain, look at these instructions on MSDN.  Use a configuration file, and use RemotingConfiguration.Configure to read it in.  The objects are available at that point.

To do it programically, look at this article at Express Computer

The MSDN article has instructions on how to configure the server application.

I don't see why building a server application is any different that other types of applications.  You just compile to get an executable.  Make sure that the configuration file (if you're using a config file) gets named to <app>.exe.config and is deployed with the executable.

Thursday, March 09, 2006 8:36:00 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]   Certifications  | 
 Tuesday, March 07, 2006
70-554 Section I Part I by phildenoncourt

UPGRADE: MCSD Microsoft .NET Skills to MCPD Enterprise Application Developer by Using the Microsoft .NET Framework: Part 2

Configure SOAP messages.

  • Configure the formatting of SOAP messages for a Web service method.
  • Specify the basic information for a Web service application.
  • Specify the bindings of a Web service application by using the WebServiceBindingAttribute attribute.
  • Configure a Web service application by using a Machine.config file.
  • Configure a Web service application by using a Web.config file.

MSDN has a section on formatting SOAP messages.  Know about the SoapDocumentMethod attribute and SoapDocumentService attribute.  Know that you can control the formatting using XML Serialization attributes.

I guess the basic information for a web service application would be specified using the WebService attribute.  You can control the name of the web service, and the namespace.

Binding defines a concrete set of operations.  It is implemented by the WebServiceBinding attribute.  You can specify what WSI spec your service conforms to (as of now, Basic Profile 1.1 or none), whether or not the service emits conformance claims, the location of the binding (defaults to your web service), and the namespace associated with the binding.  Christian Weyer has some thoughts about the attribute on his blog.

The XML Web Services repository has a good FAQ.  In in, some of the things controlled by the web/machine.config are pointed out.  Here's how to enable/disable WSDL generation.  Also you can control the location of a dynamic reference to a web service in the config file.

Tuesday, March 07, 2006 8:16:16 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]   Certifications  | 
 Thursday, March 02, 2006

PRO: Designing and Optimizing Data Access by Using Microsoft SQL Server 2005

Design client libraries to write applications that administer a SQL Server service.

  • Design server management objects (SMO) applications.
  • Design replication management objects (RMO) applications.
  • Design automation management objects (AMO) applications.
  • Design SQL Server Networking Interface (SNI) for asynchronous queries.

Here's MSDN's section on SMO.  Darshan Singh has an overview of SMO at YukonXML.comDavid Hayden has an example of how to create a table using SMO.

MSDN's section on RMO.  There is not a lot of community activity in respects to RMO.  Here's my take.  Probably good to know how to create a publication (EnabledTransPublishing on ReplicationDatabase, Create a TransPublication object), create a subscription (TransPullSubscription, or TransSubscription), how to cause a syncronization (TransSynchronizationAgent).  Keep in mind I've never use these objects, these are the ones that strike me as important.

It appears that AMO really means Analysis Management Objects.  There appears to be no reference anywhere to "automation management objects" anywhere.  Here's MSDN's section on AMOLiu Tang and Paul Bradley wrote a overview for MSDN magazine.

SNI For Async?  It looks like SNI is used for tracing.  Here's an article at MSDN that mentions SNI.

Next up-> MARS

Thursday, March 02, 2006 1:43:43 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]   Certifications  | 
70-442 - Part II - Caching by phildenoncourt

PRO: Designing and Optimizing Data Access by Using Microsoft SQL Server 2005

Design caching strategies.

  • Select ADO.NET caching.
  • Design custom caching functionality.
  • Design a refresh strategy for cached data.

ADO.NET doesn't have a cache.  ASP.NET does.  ADO.NET retrieves results into objects like DataSets that can be cached in the ASP.NET cache or your own Cache app.

Designing your caching functionality seems like a lot of work.  You have to manage the size of the cache, object expiration, object invalidation.  I use the Enterprise library and that provides a lot of the plumbing.

Refreshing cached data is done using Query Notifications via the SqlDependency or SqlNotificationRequest objects.  There is a good article on MSDN outlining how to set this up.  Steve Smith mentions the changes that have occured since Beta 2 on his blog.

Next up->Client Libraries

Thursday, March 02, 2006 1:07:33 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]   Certifications  | 

PRO: Designing and Optimizing Data Access by Using Microsoft SQL Server 2005

Design appropriate data access technologies.

Design an appropriate data access object model.

Design a cursor strategy for a data access component.

  • Decide when to use cursors.
  • Decide how to maximize cursor performance.
  • Detect which applications are using cursors and evaluate whether to remove them.