Blog Home  Home Feed your aggregator (RSS 2.0)  
What did you learn today? - Development
Phil Denoncourt's Technology Rants
 
 Saturday, June 23, 2007

From Pat Tormey, Director of NHDN:

Summer Recess..And Study Groups

During the summer I'll be forming Study Groups to meet for particular topics.

Personally I'll be studying the SmartClient Software Factory. If you are interested, drop me a note.. We'll be meeting at my office in Derry NH on Monday evenings

If you'd like to host a Study Group or have an idea for one please drop me a note and I'll see if we can find some people in your area with the same interests.

Saturday, June 23, 2007 4:44:41 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]   Development  | 
 Wednesday, March 21, 2007
Reviewing Databases by phildenoncourt

There is plenty of guidance on reviewing code.  (Look at CodeFrisk.Com/Guidance.aspx for some links).   I’ve found that the one absolutely most critical piece of a traditional application never undergoes any type of review.  Most applications live and die by their database, and yet it rarely gets reviewed.  If you’re lucky (although you might not think so), you’ll have a DBA check things over, but they are mostly concerned with performance.  Maintainability and adherence to standards are not at the top of their list.  

 

I’ve listed some of the things I look at when I review a database:

-Appropriate Normalization

It’s tough to say what is and isn’t appropriate normalization.  You’ll know it when you see it.  Stuff like having a separate table for the US states or one called Gender is usually a red flag of having gone too far.  Conversely, having the same field in multiple tables is usually a sign that normalization hasn’t gone far enough. 

 

--Is all access through stored procedures

Most development shops have a policy that says, “All data access shall be done through stored procedures”.  If you have that policy, check to make sure this is being adhered to.  An easy way to check this is to deny access to the tables and allow access to the stored procedures.

 

--Look for premature optimization

I’ve seen times when developers are using Join hints or locking hints in their query.  “If I force the query to use a merge join, it goes 5x faster”.  Meaning it goes 5 times faster on the development box which has far fewer rows and 3 fewer processors than the production machine.  Once it goes the to production machine, there’s a good chance that SQL will decide upon a different plan to execute the statement, making the hint destructive on the production machine

 

--Are objects secured and scripted

Eventually, when you roll the database into a production environment, access to the database will (or should) be locked down.  Is the development environment the same?  Are the scripts that you have under source control (you do have the database creation scripts under source control don’t you?)  include securing of the object.

 

--Are updates applied in the same order

I can think of no better way to create an application that suffers from chronic deadlock situations than to have one stored procedure update tables A, B, & C (in that order) and have another stored procedure update tables C, B, & A (in that order).  Updates should always be applied in the same order.  Otherwise, you’re increasing the chance that a row in A will be locked by one stored proc while waiting for a row in table C locked by the other stored proc.  Generally the order is enforced naturally because you have to respect foreign keys, but every now and then I come across this.

 

--Biblical stored procedures

By biblical, I mean volume, not divinely inspired.  I’m referring to stored procedures that use up more than one printer cartridge if you were to print it.  Nowadays when you see a large stored procedure, it’s one of two things:

1) Embedded business logic – This seems to be a remnant of client server programming where you were forced to put your business logic in the stored procedure.  It’s arguable about whether or the database is the best place to keep your business logic.  Most people (including me) believe it’s not.  Business logic should be kept in a business logic component that validates, enforces and calculates.

2) Poorly defined schema.  Most validations that you see taking place in stored procs could be controlled by using schema constructs like check constraints, triggers, foreign keys and defaults.  Put that type of information in the schema where it can be enforced consistently everywhere rather than burdening the stored procedure.

 

--Does the Development database match what’s in source control

To reinforce this, database schema objects need to be kept under source control.  Having a database backup plan isn’t sufficient.  Putting the objects in source control allow you to manage changes and releases much more effectively.  Otherwise you’ll be struggling to determine which objects have changed in the development database and need to be deployed, and figuring out who added a field to table.  Keeping the schema under source control used to be a very manual process, but now tools like Visual Studio for Database Professionals make this painless.

 

--Cloned stored procedures / Views.

This is very common.  When you need data, most developers don’t look to see if there is an existing stored procedure or view that satisfies their needs.   They’ll create a brand new one.  Then you end up with a database that has 3 different stored procedures that get a customer record by its ID.  Now all these duplicated procs have to be maintained.  Best prevention for this is to have a strict naming convention for your procedures. 

 

Those are some of the quick patterns I look for when reviewing a database.  What are the kind of things that stand out when you look over a database?

 

Wouldn’t you feel comfortable having your code reviewed by an expert?  Go to CodeFrisk.com to see how I can proofread your code at a reasonable price.

Wednesday, March 21, 2007 3:59:42 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]   Development | SQL  | 
 Sunday, March 18, 2007
Code Reviews by phildenoncourt

As a developer, I love to write code.  Getting involved with a problem, putting my head down, coming up a solution and getting immersed in implementing it is an awesome feeling.

 

I also enjoy reading code.  I like to look at a set of code and try to understand what the developer was thinking.  You can tell a lot about how a person thinks by reading their code.  To me, participating in code reviews is a lot of fun. 

 

I originally wrote these articles 6 months ago when I was about to launch a new service to review code (For details, go to http://www.codefrisk.com/).  I’m now much too busy to review code, but these articles might be of value, so I’m uploading them now.  I’ll be posting a set of articles about code reviews, best practices, coding standards…  Here’s the first article in the series:

 

Coding Standards – How to write a useful Coding Standard document

 

Coding standards are an agreement in a development team on how the code for an application will be structured.  They promote consistency and easier maintenance.    A well-written and enforced standards policy will reduce the time that it takes for new developers to feel comfortable with an application.

 

I’ve worked with many organizations over the years, most of which have some sort of coding standards documentation.  The needs of each organization are very different.  Some organizations sold their source code, so they needed to make sure there was a very high level of consistency.  Others were in regulated industries and needed to conform to various practices.   Others were small shops that didn’t see tremendous value in having standards.

 

The standards document had varying degrees of success.  Some negative factors in coding standards I’ve seen are:

  • Aggressively guarded by one person
  • Enforcing rules that were necessary in older technologies, but no longer hold relevance (Especially if they were burned by the problem)
  • Too slim
  • Too thick
  • Too strict
  • Begun, but never completed.
  • Finished, but never enforced.

 

I’ve written a few coding standards docs and want to share what I’ve found useful, and what you should try to avoid.

 

First of all, if you’re the person tasked with writing the documentation, make sure that you involve the rest of the development team as much as possible.  You shouldn’t take the Moses approach; climb a mountain and come down with a set of commandments.  If the developers on your team aren’t consulted on the coding standards, you’ll see a lot of them not adhering to them.  These aren’t your coding standards; they are the teams coding standards.

 

Recognize that you aren’t writing a document with a stone and chisel.  Coding standards must evolve as your team has learned new techniques.  It’s also not uncommon to see coding standard docs that never get finished because the author feels the document must be 100% “complete” before they release it.  Take an iterative approach and get the document out as soon as possible and continue to refine it.  It’s impossible to get it all right the first, second, or even tenth time.  You’ll probably find that most code reviews will result in changes to the standards document.

 

Kick off a discussion for an hour or two and brainstorm with the development team about possible standards.  There will be contention.  If the team can’t come to an agreement after 5 minutes, table the discussion point for later.  If people have time to think about something, they will usually be able to justify or back off on their opinion.    The important thing is to limit the time of the discussion in the interests of getting an initial version within the same day.  Good developers will talk about coding all day.

 

Areas to discuss would include:

  • Naming standards – Camel, Pascal, Hungarian, Dewey Decimal System?
  • Documentation expectations – XML Documentation, when/where to put comments, class headers…
  • Design Patterns that are expected to be used – Factories, Observers, Decorators…
  • Automated Unit Testing strategy – TDD? Tool type, Code Coverage…
  • Exception handling – Handling, Reporting…
  • Use of constants/resource files – Yes, No?  One resource for the entire project, one constant class?
  • Complexity limits for methods - > 25 is a lot of branches
  • Data access methods - Stored Procs, Dynamic SQL, DataSets, ORM, Async, Transaction strategy…
  • Common components to use – Enterprise Library, Log4net, Infragistics…
  • Localization strategy -  Needed?  Roll your own or use .NET framework...
  • Versioning – Versioning assemblies, strong name, signing, delay signing…
  • Build strategy – Nightly builds, one solution or many solutions, creating references
  • !Code Formatting

 

For the most part try to stay away from talking about code formatting.  Formatting standards inevitably come down to one developer’s preference over another.  Capable developers will be able to read the code regardless of how it is formatted. 

 

After you’ve had your discussion with the team, hole yourself up in your office/cube and write up the standards doc.  Unless you’re a dreadfully slow typist, this shouldn’t take more than 1 – 3 hours.

 

The important thing is to make sure you get feedback from your team on the document.  If they don’t feel involved and disagree with a particular piece, they will exhibit a little bit of passive-aggressive behavior and not adhere to the standard.

 

The Coding Standards document sets up expectations at code reviews.  It is a declaration of what is expected from a developer when they submit code.  Without some set of coding standards, code reviews can quickly become counterproductive.

 

Coding standard docs aren’t hard, aren’t intrusive, and are a valuable resource.  When there is a set of guidelines to use when coding, it is much easier to bring new developers into the project.  The benefits justify the one day of effort it takes to put one in place.

 

 

Wouldn’t you feel comfortable having your code reviewed by an expert?  Go to CodeFrisk.com to see how I can proofread your code at a reasonable price.

Sunday, March 18, 2007 11:18:14 AM (GMT Standard Time, UTC+00:00)  #    Comments [0]   Development  | 
 Wednesday, September 13, 2006

I was sitting in this meeting in which I was only peripherally involved and there's only so much doodling you can do.  So I did what most other people in the meeting were doing, I started daydreaming. 

I'm a big fan of GUIDs, but there's all this noise about how slow they are because they're bigger than an int making for inefficient searching.  So I started to wonder, how much slower are GUIDs than ints? So I started to do the math, and for a table with 4 billion rows, I was coming out with only one more logical read if the key was a GUID than an int.... I know.. Didn't make sense to me.  So I put it to a practical test.

I created two tables.  One called IntWithData and the other called GuidWithData.


I then populated them with a million rows.  --As a side note, I started the scripts to populate the tables at about the same time, and they finished at the same time.  Something I think I will research further is how much slower it is to insert a record in a GUID table compared to an INT table.  Common sense tells you it's supposed be way slower to populate a GUID--

I took a look at the statistics for the index by using the following statements:
SELECT * FROM sys.dm_db_index_physical_stats(db_id(),object_id('dbo.GuidWithData'),null,null,'DETAILED')
SELECT * FROM sys.dm_db_index_physical_stats(db_id(),object_id('dbo.IntWithData'),null,null,'DETAILED')

That produced these results:

database_id object_id index_id partition_number index_type_desc alloc_unit_type_desc index_depth index_level avg_fragmentation_in_percent fragment_count avg_fragment_size_in_pages page_count avg_page_space_used_in_percent record_count ghost_record_count version_ghost_record_count min_record_size_in_bytes max_record_size_in_bytes avg_record_size_in_bytes forwarded_record_count
6 2105058535 1 1 CLUSTERED INDEX IN_ROW_DATA 3 0 99.037 5919 1 5919 68.85677 1000000 0 0 31 31 31 NULL
6 2105058535 1 1 CLUSTERED INDEX IN_ROW_DATA 3 1 96.2963 27 1 27 67.68663 5919 0 0 23 23 23 NULL
6 2105058535 1 1 CLUSTERED INDEX IN_ROW_DATA 3 2 0 1 1 1 8.314801 27 0 0 23 23 23 NULL

 

database_id object_id index_id partition_number index_type_desc alloc_unit_type_desc index_depth index_level avg_fragmentation_in_percent fragment_count avg_fragment_size_in_pages page_count avg_page_space_used_in_percent record_count ghost_record_count version_ghost_record_count min_record_size_in_bytes max_record_size_in_bytes avg_record_size_in_bytes forwarded_record_count
6 2073058421 1 1 CLUSTERED INDEX IN_ROW_DATA 3 0 0.423403 317 8.195584 2598 99.84113 1000000 0 0 19 19 19 NULL
6 2073058421 1 1 CLUSTERED INDEX IN_ROW_DATA 3 1 0 9 1 9 46.33886 2598 0 0 11 11 11 NULL
6 2073058421 1 1 CLUSTERED INDEX IN_ROW_DATA 3 2 0 1 1 1 1.420806 9 0 0 11 11 11 NULL

This is tough to read, but what you should be getting out of this is: Guid table is very fragmented.  Int table is not.  Also the Guid table had about 2.25 more pages than the Int table.  Ideally you would be running an overnight process that reorganized pages in your DB, and that would help with the fragmentation.

So now the test query.  The question I wanted to answer was:  How many logical reads are needed to read a record out of the Int table and how many are needed to read out of the Guid table?

SET STATISTICS IO ON

SELECT * FROM IntWithData WHERE ID = 619284

SELECT * FROM GuidWithData
WHERE ID = '7DD78950-D3CD-4016-8D92-738A6E0666F2' -- I had to find this value in my data

The results:
....

(1 row(s) affected)

Table 'IntWithData'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

Table 'GuidWithData'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

3 logical reads. That means in a table with 1 million rows, there should be no difference between a table with Guid as a key and one that has an int.  This is because the records are searched for using a binary search algorithm.  In a perfectly balanced tree, it shouldn't take the algorithm more than 20 tries to find the key with many of those "tries" are contained in the same index page.  At some volume, and I haven't calculated exactly where, there will be a difference of one logical read, but that's it.

Hopefully, I haven't misinterpreted the results of this test.  I don't think I have.  The bottom line is that Guids take up more physical space, but you're able to find them just as fast as ints.

 

Wednesday, September 13, 2006 9:51:55 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]   Development | SQL  | 
 Friday, August 18, 2006

I was preparing for the SQL Server Exam 70-441 (Designing Database Solutions by Using Microsoft® SQL Server™ 2005) and I kept coming across a whole bunch of small features that you don’t hear a lot about.  I’m not talking about the big ones everyone already knows about like CLR Integration, DDL triggers, XML, and ranking functions.  These are ones that took me by pleasant surprise.  I think they’re really cool, so I wanted to point them out.

 

Included Columns / Covering Indexes

Say you had a critical/oft-used report that ran a statement like:

 

SELECT Name, ProductNumber

FROM Production.Product

WHERE Style='U' AND Class='L'

 

If you wanted to put an index on the table to speed things up, you’d put an index on Style, Class, Name, and ProductNumber.  That would remove the need to go from the index page to the table page to get all the information for the query.  SQL 2005 has added the include clause to the Create Index statement.

 

CREATE INDEX IX_Product_Class_Style

ON Production.Product (Class,Style)

INCLUDE (Name, ProductNumber);

 

What this is doing is indexing just the Class and Style fields of the table.  The Name and ProductNumber are stored at the leaf nodes of the index.  This saves space in the index pages, meaning you can get more rows in an index page, reducing I/O and Disk Space.

 

Dynamic Management Views

Seriously, these are pretty cool.  Really.  If you haven’t looked at these, take the time to play around with a few. 

 

With these new views, you are able to get high visibility into the SQL Server engine.  Wondering what’s about to be written to the disk?  Query sys.dm_io_pending_io_requests.  Troubleshooting execution plans?  Use sys.dm_exec_cached_plans and sys.dm_exec_plan_attributes.  Need to see what’s running on the server right now?  sys.dm_os_tasks.  How often is an index used?  There’s sys.dm_db_index_usage_stats that will answer that question.  There are over 80 views that provide very useful information when you are troubleshooting problems or just curious as to how SQL Server organizes itself.

 

sp_create_plan_guide

Every now and then, I have a query that runs like a jackrabbit on my development machine, but runs like 3 legged turtle on a production machine.  This is usually because it decided not to use an index on the production machine due to differences in data density or because running the query with parallelism is slower with some statements.  To troubleshoot these situations, I end up modifying the stored procedure to add various hints.

 

Rather than modify the stored procedure, you can specify a hint for an existing query plan using the sp_create_plan_guide stored proc.  This way you can experiment without modifying the stored procedure and the hint doesn’t become part of your codebase.  MSDN documentation can be found here.

 

OUTPUT Clause

A lot of people already know about this pretty cool feature, but I didn’t.  In the past, if you do an insert to a table, and the table has an identity column, after the insert, you would look at @@IDENTITY or scope_identity().  That always seemed hackish to me.

 

You can now add an OUTPUT clause to the end of an INSERT, UPDATE, or DELETE statement.  That will return the rows changed by the statement.

 

INSERT INTO Person.Address

      (AddressLine1,AddressLine2,City

      ,StateProvinceID,PostalCode,rowguid,ModifiedDate)

VALUES(@AddressLine1,@AddressLine2,@City

      ,@StateProvinceID,@PostalCode,@rowguid,@ModifiedDate)

OUTPUT INSERTED.AddressID;

 

You use the pseudo tables INSERTED and DELETED, just like in triggers.

 

Storing BLOBs in Table page

Inevitability, on each project I work on, there is a requirement to have a field capable of storing more than 8k in a table.  It’s usually a field that the user can enter unlimited comments about a customer.  So I define it as a text field.  I always feel guilty about it because the user’s text will rarely be more than 100 bytes and I know it’s going to be stored on a different page.

 

In SQL 2005, if the data will fit in the same page as the rest of the record, it will place it in the same page.  This works with varchar/varbinary/nvarchar(max) and XML datatypes.  The behavior is described here at MSDN.

Persistence of Computed Columns

This is kind of neat.  We’ve had the ability to define computed columns in tables for a while.  They worked exactly as they’re named.  When the column was needed, it was computed.  Now the results of the computed column can be saved to disk.  This allows you to index computed columns.  You can also feel free to create calculated columns that can be computationally intense.

 

CREATE TABLE LineItem (

      LineItemID int IDENTITY(1,1),

      OrderID int,

      ItemDesc varchar(512),

      Qty int,

      Cost money,

      LineTotal as Qty * Cost PERSISTED

)

 

Those are all the ones I found.  What you have noticed as small new features in SQL that make life easier?

Friday, August 18, 2006 9:01:27 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]   Development | SQL  | 
 Thursday, October 27, 2005
Last night I posted a webcontrol that I wrote to my website that makes life more difficult for spammers. It's a substitute for hyperlink control in ASP.NET named the obscure hyperlink control

One of the ways that spammers get email addresses is that they have programs that spider the web, looking for email addresses embedded in webpages. They target forum based sites because people are more likely to leave their email addresses there. This has caused people to start leaving their email addresses in cryptic formats (for example: me {at} mydomain.com). I find these techniques annoying as an end user trying to contact someone, but I also have to believe that spammers have caught on and look for variants with the word "at" in them. The obscure hyperlink control can be used for any hyperlink, mailto or http. Besides thwarting spammers, another use of the control would be to link to an objectionable site without contributing to its search engine rank.

What the obscure hyperlink control does is scrambles (note - I'm not saying encrypt) the hyperlink when the page is being created on the webserver using a random technique. An scrambled example of my email address is 'mcstiostucoe@ipolamit:hldnnorascae.o'. You can see a functioning example here. A matching javascript function is added to the webpage that unscrambles the hyperlink when the user clicks on it. When you view the source of the webpage, the link is removed, and an onClick handler is added to the hyperlink. Nowhere will you see the text of the hyperlink. It is present in the onClick handler, but it is not very legible. The Url is not stored in Viewstate, so it can't be taken from there, either.

Here are pros & cons of this control:
Pros:
  • Easy to use (works exactly the same as the existing hyperlink control)
  • The hyperlink information is not in the href attribute, but in the onClick (an area that spammers don't always pay atttention to)
  • The diversity of scrambling algorithms makes it difficult for spammers to target a specific implementation
  • Doesn't require a lot of server resources
Cons:
  • Doesn't completely prevent spammers from getting email addresses. A determined spammer could reverse engineer the control. This is just adds a roadblock for spammers.
  • Requires that the user's browser supports javascript and that it is enabled.
  • Limited number of scrambling algorithms. Right now there are 5. If this fills a need, I intend to add more, but it will still be a finite number.
Thursday, October 27, 2005 4:02:31 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]   Development | DotNet | ASP.NET  |